multiply two 32-bit numbers to get a 64-bit number, on a 8086 (32x32 => 64-bit with 16-bit multiplies)

How can i Multiply two 32bit digits in assembly or one 32bit another 16bit, anyone knows the algorithm ?

data1 dw 32bit
data2 dw 32bit    
mov ax,data2
Mul data1

Solution

First, dw is used to create a 16-bit ("word") value. It won't hold a 32-bit value. You'd need to use dd to store a 32-bit "dword", or use a pair of 16-bit values.

To multiply a pair of 32-bit values the result can be 64-bit (e.g. 0xFFFFFFFF * 0xFFFFFFFF = 0xFFFFFFFE00000001). For 8086 (and not just real mode code for 80386 or later) there is a MUL instruction, but it is limited to multiplying 2 16-bit values (and getting a 32-bit result). This means that you'd want to treat each 32-bit value as a pair of 16-bit values.

If A is split into A_low (the lowest 16-bits of the first 32-bit number) and A_high (the highest 16-bits of the first 32-bit number), and B is split into B_low and B_high in the same way; then:

  A * B = A_low * B_low
          + ( A_high * B_low ) << 16
          + ( A_low * B_high ) << 16
          + ( A_high * B_high ) << 32

The code might look like this (NASM syntax):

         section .data
first:   dw 0x5678, 0x1234  ;0x12345678
second:  dw 0xDEF0, 0x9ABC  ;0x9ABCDEF0
result:  dw 0, 0, 0, 0      ;0x0000000000000000
         section .text

    mov ax,[first]          ;ax = A_low
    mul word [second]       ;dx:ax = A_low * B_low
    mov [result],ax
    mov [result+2],dx       ;Result = A_low * B_low

    mov ax,[first+2]        ;ax = A_high
    mul word [second]       ;dx:ax = A_high * B_low
    add [result+2],ax
    adc [result+4],dx       ;Result = A_low * B_low
                                     ; + (A_high * B_low) << 16

    mov ax,[first]          ;ax = A_low
    mul word [second+2]     ;dx:ax = A_low * B_high
    add [result+2],ax
    adc [result+4],dx       ;Result = A_low * B_low
                                     ; + (A_high * B_low) << 16
                                     ; + (A_low * B_high) << 16
    adc word [result+6], 0   ; carry could propagate into the top chunk

    mov ax,[first+2]        ;ax = A_high
    mul word [second+2]     ;dx:ax = A_high * B_high
    add [result+4],ax
    adc [result+6],dx       ;Result = A_low * B_low
                                     ; + (A_high * B_low) << 16
                                     ; + (A_low * B_high) << 16
                                     ; + (A_high * B_high) << 32

We don't need adc word [result+6], 0 after the second step ([first+2] * [second]) because its high half is at most 0xfffe. [result+4] is already zero at that point (because this code only works once), so the adc [result+4],dx can't wrap and produce a carry out. It can at most produce 0xffff.

(It could be done as adc dx, 0 / mov [result+4], dx to avoid depending on that part of result being already zeroed. Similarly, adc into a zeroed register could be used for the first write to [result+6], to make this code usable without first zeroing result.)

If you are actually using an 80386 or later, then it's much much simpler:

         section .data
first:   dd 0x12345678
second:  dd 0x9ABCDEF0
result:  dd 0, 0            ;0x0000000000000000
         section .text

    mov eax,[first]          ;eax = A
    mul dword [second]       ;edx:eax = A * B
    mov [result],eax
    mov [result+4],edx       ;Result = A_low * B_low