I'm working on a compiler that translates a Java-like language to LLVM and I'm having problems with the getelementptr
instruction.
Say I have classes A and B defined as follows:
class A {int val;}
class B {A a;}
This gets translated to LLVM types:
%A = type {%A_vtable_type*, i32}
%B = type {%B_vtable_type*, %A*}
where the first elements are pointers to virtual tables and aren't important for this question.
I want to translate the following code:
A a;
B b;
b = new B;
b.a = new A;
b.a.val = 42;
The first four instructions work just fine and the fifth one gets translated to something like this:
%b = alloca %B*
...
%1 = load %B*, %B** %b
%2 = getelementptr %B, %B* %1, i32 0, i32 1, i32 1
store i32 42, i32* %2
My understanding of these indices is this:
%1
of type %B*
a
of struct %B*
val
of type %A*
Now, I read https://llvm.org/docs/GetElementPtr.html and from what I understood, one can do multiple address computations in one GEP without needing to dereference the subsequent pointer types, however after running
llvm-as-6.0 classes.ll -o classes.bc
I get
llvm-as-6.0: classes.ll:40:31: error: invalid getelementptr indices
The error goes away after removing the i32 0
, but for example when I tried to create even more nested classes, then I get a type error.
I fond that GEP is best understood as a high-level way of doing pointer arithmethic. Your use tries to dereference B's pointer to an A, which it never will do. GEP does not access memory, it merely adds sizes to a pointer you give it. You give GEP a pointer to a complex object with a fixed layout, it returns a pointer to some field/entry within the complex object.
Your A isn't at a fixed offset within B. Rather, your A* is at a fixed offset within B and finding the address of the A requires reading B's A*.
If you were to use nested types (that is, if your B were to contain an A rather than a reference to an A) then you could use GEP and a B* to get the address of A's i32 (and afterwards store 42 into it).