Search code examples
cassemblyx86reverse-engineering

Reverse engineer array dimensions / struct layout from compiler asm output?


In this code, A and B are constants defined with #define. What are the values of A and B?

typedef struct {
    int x[A][B];
    long y;
} str1;

typedef struct {
    char array[B];
    int t;
    short S[A];
    long u;
} str2;

void setVal(str1 *p, str2 *q) {
    long v1 = q->t;
    long v2 = q->u;
    p->y = v1+v2;
}

The following assembly code is generated for the setVal procedure:

setVal:
    movslq  8(%rsi), %rax
    addq   32(%rsi), %rax
    movq     %rax, 184(%rdi)
    ret

Solution

  • The structure has the following alignment requirements:

    • a char may start at any byte
    • a short may start at even byte
    • an int may start at byte, divisible by four
    • a long may start at byte, divisible by eight

    The str1.y field is a long and starts at 184, this implies, that str1.x may hold either 184 or 180 bytes.

    The str2.t field is an int and starts at 8, this implies, that str1.array may hold from 5 to 8 bytes.

    The str2.u field is a long and starts at 32, this implies, that str2.S may hold from 14 to 20 bytes.

    This is the diagram for str1 structure fields:

    +---------------+---+--------+
    |  int x[A][B]  | ? | long y |
    +---------------+---+--------+
    |        184        |    8   |
    +-------------------+--------+
    

    And this is the diagram for str2 fields:

    +---------------+---+-------+------------+---+--------+
    | char array[B] | ? | int t | short S[A] | ? | long u |
    +---------------+---+-------+------------+---+--------+
    |         8         |   4   |        20      |    8   |
    +-------------------+-------+----------------+--------+
    

    After that, you should solve the following system:

    177 <= 4 * A * B <= 184
    5 <= B <= 8
    14 <= A * 2 <= 20      // 7 <= A <= 10
    

    The answer is: A = 9, B = 5


    You can test your answer (and the ranges for each inequality) using a compiler that follows the same ABI / calling convention used by the compiler that produced the original code. It uses 8-byte long: note the 64-bit operand size for the addq, instead of addl, and the 8-byte store. Thus, we can infer that it's most likely the x86-64 System V ABI, not the Windows x86-64 calling convention (which uses 4-byte long).

    The Godbolt compiler explorer has gcc, clang, ICC, and MSVC. The first 3 target Linux, but MSVC targets the Windows calling convention and thus won't agree on struct layout with a smaller long requiring less alignment.

    Replacing int x[A][B] with char t[177] (or other sizes) proves that 177 is the minimum and 184 is the maximum size that that leads to a store to 184(%rdi). So we could have written 176 < 4*A*B <= 184. Or, to keep things to multiples of 4, 180 <= 4*A*B <= 184 is also more or less correct; we can rule out 177..179 based on the size of int.