Search code examples
cassemblyx86bigint

implement 64-bit arithmetic on a 32-bit machine


The following code computes the product of x and y and stores the result in memory. Data type ll_t is defined to be equivalent to long long.

typedef long long ll_t;
void store_prod(ll_t *dest, int x, ll_t y) {
    *dest = x*y;
}

gcc generates the following assembly code implementing the computation: dest at %ebp+8, x at %ebp+12, y at %ebp+16

1 movl 16(%ebp), %esi
2 movl 12(%ebp), %eax
3 movl %eax, %edx
4 sarl $31, %edx
5 movl 20(%ebp), %ecx
6 imull %eax, %ecx
7 movl %edx, %ebx
8 imull %esi, %ebx
9 addl %ebx, %ecx
10 mull %esi
11 leal (%ecx,%edx), %edx
12 movl 8(%ebp), %ecx
13 movl %eax, (%ecx)
14 movl %edx, 4(%ecx)

This code uses three multiplications to implement the multiprecision arithmetic required to implement 64-bit arithmetic on a 32-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how it realizes your algorithm.

I don't understand line 8 and line 9 in assembly code above. Can anyone help?


Solution

  • I've converted it to intel syntax.

    mov esi, y_low
    mov eax, x
    mov edx, eax
    sar edx, 31
    mov ecx, y_high
    
    imul ecx, eax ; ecx = y_high *{signed} x
    
    mov ebx, edx
    
    imul ebx, esi ; ebx = sign_extension(x) *{signed} y_low
    
    add ecx, ebx ; ecx = y_high *{signed} x_low + x_high *{signed} y_low
    
    mul esi ; edx:eax = x_low *{unsigned} y_low
    
    lea edx, [ecx + edx] ; edx = high(x_low *{unsigned} y_low + y_high *{signed} x_low + x_high *{signed} y_low)
    
    mov ecx, dest
    mov [ecx], eax
    mov [ecx + 4], edx
    

    What the above code does is multiplication of 2 64-bit signed integers that keeps the least-significant 64 bits of the product.

    Where does the other 64-bit multiplicand come from? It's x sign-extended from 32 bits to 64. The sar instruction is used to replicate x's sign bit into all bits of edx. I call this value consisting only of the x's sign x_high. x_low is the value of x actually passed into the routine.

    y_low and y_high are the least and most significant parts of y, just like x's x_low and x_high are.

    From here it's pretty easy:

    product = y *{signed} x =
    (y_high * 232 + y_low) *{signed} (x_high * 232 + x_low) =
    y_high *{signed} x_high * 264 +
    y_high *{signed} x_low * 232 +
    y_low *{signed} x_high * 232 +
    y_low *{signed} x_low

    y_high *{signed} x_high * 264 isn't calculated because it doesn't contribute to the least significant 64 bits of the product. We'd calculate it if we were interested in the full 128-bit product (full 96-bit product for the picky).

    y_low *{signed} x_low is calculated using unsigned multiplication. It's legal to do so because 2's complement signed multiplication gives the same least significant bits as unsigned multiplication. Example:
    -1 *{signed} -1 = 1
    0xFFFFFFFFFFFFFFFF *{unsigned} 0xFFFFFFFFFFFFFFFF = 0xFFFFFFFFFFFFFFFE0000000000000001 (64 least significant bits are equivalent to 1)