Search code examples
arraysassemblyx86reverse-engineeringgnu-assembler

Reverse engineering and interpreting assembly code


I am having difficulty with reverse engineering this assembly code to deduce the values of the array's dimensions.

I am given

struct vec3 {
  long z;
  int x;
  unsigned short y;

};

struct vec3 array1[2][A];
struct vec3 array2[8][B];
int arrayfunc(int i1, int j1, int i2, int j2){
   return array1[i1][j1].x  + array1[i1][j1].y - array2[i2][j2].y;
}

This is the C code provided and the types of the member data x,y,z were unknown but this is what I deduced them to be.

arrayfunc:
    leaq    array1(%rip), %rax
    movslq  %ecx, %rcx
    movslq  %edx, %r10
    movslq  %r9d, %r9
    leaq    (%rcx,%rcx,2), %rdx
    movslq  %r8d, %r8
    movq    %rax, %rcx
    addq    %r10, %rdx
    salq    $4, %rdx
    movzwl  12(%rax,%rdx), %eax
    addl    8(%rcx,%rdx), %eax
    leaq    (%r9,%r8,2), %rdx
    leaq    array2(%rip), %rcx
    salq    $4, %rdx
    movzwl  12(%rcx,%rdx), %edx
    subl    %edx, %eax
    ret    

The issue here is that I am not sure how I can find the values of A and B from the assembly code.

Any and all help is always appreciated :)

Thanks :))


Solution

  • Indexing a 2D array has to scale the first index by sizeof(struct vec3[A]): array1 is an array of arrays, and each smaller array has A elements. So you look at the asm and see what it's multiplying by.

    Given, struct vec3 array1[2][A];,
    array1[i1][j1].x is the same address math as for a flat 1D array: array1[ (i1*A) + j1 ].x. And in C, we index by element not bytes, so the asm also has to scale by sizeof(struct vec3). That's clearly what the sal $4, %reg instructions are doing, because after padding for alignment the struct size is 16 bytes.

    Notice that the leading dimension [2] doesn't come into the calculation at all; that just tells you how much total space you have. It's the later dimensions that set the geometry; the stride between the the same column in different rows.


    If you don't already see how that C would compile for different A and B values, try it with some sample ones and see what changes when you increase A or B by 1. https://godbolt.org/ is ideal for playing around with stuff like that.

    e.g. https://godbolt.org/z/zrecTcqMs uses prime numbers 3 and 7 for A and B, so even without changing the numbers, you can see which are multiples of which.

    Except GCC is too clever for it to be that simple: it's multiplying using one or two LEA, e.g. RCX + RCX*2 = RCX*3, not using imul $3, %rcx, %rdx for example. If you use large non-simple numbers like 12345 for A and B, you'll see actual imul. https://godbolt.org/z/4G3qc5d5E.


    I used gcc -fpie to make it use position-independent code: a RIP-relative LEA to get array addresses into registers, instead of addressing modes like array1(%rcx, %rdx, 2) which require the array address (in the .data or .bss section) to fit in a 32-bit sign-extended disp32 in the machine code.

    I also used __attribute__((ms_abi)) to use the Windows x64 calling convention like your code does, since GCC on the Godbolt compiler explorer is targeting Linux. (MSVC is the only compiler on Godbolt that targets Windows by default, but it won't output in AT&T syntax.)