LLVM alignment of nested structs/arrays

I want to get the exact byte representation of nested struct/array datatypes. For example the following C struct:

typedef struct zTy {
    int x;
    char c[2];
    struct  { char d; } v;
} z;

It gets converted to the following LLVM IR:

%struct.zTy = type { i32, [2 x i8], %struct.anon }
%struct.anon = type { i8 }

%a = alloca %struct.zTy, align 4

From the alloca instruction it is possible to see the alignment (4 byte). But I don't know where this alignment is inserted or how alignment for nested structs is calculated. I get the total size of the struct for my target triple using getTypeAllocSize():

AllocaInst* AI;
Module &M;
Type* T = AI->getAllocatedType();
int size = M.getDataLayout()->getTypeAllocSize(T) // 8 Byte

Is there a way to determine the exact layout for arbitrary nested datatypes for my target architecture from a LLVM pass?

Solution

This is ABI specific, so it depends on the target. Clang will compute it in general for C/C++ as the max of the alignment of the individual members. Here the integer is the largest field, and has a default alignment constraint of 4, which is what you get.

Clang has -fdump-record-layouts as cc1 option to help figuring out the layout of struct/class, for example here:

$ echo "struct zTy {
    int x;
    char c[2];
    struct  { char d; } v;
} z;" | clang -x c  -w - -Xclang -fdump-record-layouts  -c

*** Dumping AST Record Layout
         0 | struct zTy::(anonymous at <stdin>:4:5)
         0 |   char d
           | [sizeof=1, align=1]

*** Dumping AST Record Layout
         0 | struct zTy
         0 |   int x
         4 |   char [2] c
         6 |   struct zTy::(anonymous at <stdin>:4:5) v
         6 |     char d
           | [sizeof=8, align=4]

Inside LLVM, you lose the "C" types, but if you want to inspect a struct you need to use:

const StructLayout *getStructLayout(StructType *Ty) const;

And then using the returned StructLayout, you can get the offset of each element using:

uint64_t StructLayout::getElementOffsetInBits(unsigned Idx) const