I am having difficulty with reverse engineering this assembly code to deduce the values of the array's dimensions.
I am given
struct vec3 {
long z;
int x;
unsigned short y;
};
struct vec3 array1[2][A];
struct vec3 array2[8][B];
int arrayfunc(int i1, int j1, int i2, int j2){
return array1[i1][j1].x + array1[i1][j1].y - array2[i2][j2].y;
}
This is the C code provided and the types of the member data x,y,z were unknown but this is what I deduced them to be.
arrayfunc:
leaq array1(%rip), %rax
movslq %ecx, %rcx
movslq %edx, %r10
movslq %r9d, %r9
leaq (%rcx,%rcx,2), %rdx
movslq %r8d, %r8
movq %rax, %rcx
addq %r10, %rdx
salq $4, %rdx
movzwl 12(%rax,%rdx), %eax
addl 8(%rcx,%rdx), %eax
leaq (%r9,%r8,2), %rdx
leaq array2(%rip), %rcx
salq $4, %rdx
movzwl 12(%rcx,%rdx), %edx
subl %edx, %eax
ret
The issue here is that I am not sure how I can find the values of A and B from the assembly code.
Any and all help is always appreciated :)
Thanks :))
Indexing a 2D array has to scale the first index by sizeof(struct vec3[A])
: array1
is an array of arrays, and each smaller array has A
elements. So you look at the asm and see what it's multiplying by.
Given, struct vec3 array1[2][A];
,
array1[i1][j1].x
is the same address math as for a flat 1D array: array1[ (i1*A) + j1 ].x
. And in C, we index by element not bytes, so the asm also has to scale by sizeof(struct vec3)
. That's clearly what the sal $4, %reg
instructions are doing, because after padding for alignment the struct size is 16 bytes.
Notice that the leading dimension [2]
doesn't come into the calculation at all; that just tells you how much total space you have. It's the later dimensions that set the geometry; the stride between the the same column in different rows.
If you don't already see how that C would compile for different A and B values, try it with some sample ones and see what changes when you increase A or B by 1. https://godbolt.org/ is ideal for playing around with stuff like that.
e.g. https://godbolt.org/z/zrecTcqMs uses prime numbers 3 and 7 for A and B, so even without changing the numbers, you can see which are multiples of which.
Except GCC is too clever for it to be that simple: it's multiplying using one or two LEA, e.g. RCX + RCX*2
= RCX*3, not using imul $3, %rcx, %rdx
for example. If you use large non-simple numbers like 12345 for A and B, you'll see actual imul
. https://godbolt.org/z/4G3qc5d5E.
I used gcc -fpie
to make it use position-independent code: a RIP-relative LEA to get array addresses into registers, instead of addressing modes like array1(%rcx, %rdx, 2)
which require the array address (in the .data or .bss section) to fit in a 32-bit sign-extended disp32 in the machine code.
I also used __attribute__((ms_abi))
to use the Windows x64 calling convention like your code does, since GCC on the Godbolt compiler explorer is targeting Linux. (MSVC is the only compiler on Godbolt that targets Windows by default, but it won't output in AT&T syntax.)