I want to get the exact byte representation of nested struct/array datatypes. For example the following C struct:
typedef struct zTy {
int x;
char c[2];
struct { char d; } v;
} z;
It gets converted to the following LLVM IR:
%struct.zTy = type { i32, [2 x i8], %struct.anon }
%struct.anon = type { i8 }
%a = alloca %struct.zTy, align 4
From the alloca instruction it is possible to see the alignment (4 byte). But I don't know where this alignment is inserted or how alignment for nested structs is calculated. I get the total size of the struct for my target triple using getTypeAllocSize():
AllocaInst* AI;
Module &M;
Type* T = AI->getAllocatedType();
int size = M.getDataLayout()->getTypeAllocSize(T) // 8 Byte
Is there a way to determine the exact layout for arbitrary nested datatypes for my target architecture from a LLVM pass?
This is ABI specific, so it depends on the target. Clang will compute it in general for C/C++ as the max of the alignment of the individual members. Here the integer is the largest field, and has a default alignment constraint of 4, which is what you get.
Clang has -fdump-record-layouts
as cc1 option to help figuring out the layout of struct/class, for example here:
$ echo "struct zTy {
int x;
char c[2];
struct { char d; } v;
} z;" | clang -x c -w - -Xclang -fdump-record-layouts -c
*** Dumping AST Record Layout
0 | struct zTy::(anonymous at <stdin>:4:5)
0 | char d
| [sizeof=1, align=1]
*** Dumping AST Record Layout
0 | struct zTy
0 | int x
4 | char [2] c
6 | struct zTy::(anonymous at <stdin>:4:5) v
6 | char d
| [sizeof=8, align=4]
Inside LLVM, you lose the "C" types, but if you want to inspect a struct you need to use:
const StructLayout *getStructLayout(StructType *Ty) const;
And then using the returned StructLayout
, you can get the offset of each element using:
uint64_t StructLayout::getElementOffsetInBits(unsigned Idx) const