NASM floating point - invalid combination of opcode and operands

I am trying to compile the following code sample (NASM syntax) from this article on x86 assembly floating point:

;; c^2 = a^2 + b^2 - cos(C)*2*a*b
;; C is stored in ang

global _start

section .data
    a: dq 4.56   ;length of side a
    b: dq 7.89   ;length of side b
    ang: dq 1.5  ;opposite angle to side c (around 85.94 degrees)

section .bss
    c: resq 1    ;the result ‒ length of side c

section .text
    _start:

    fld qword [a]   ;load a into st0
    fmul st0, st0   ;st0 = a * a = a^2

    fld qword [b]   ;load b into st1
    fmul st1, st1   ;st1 = b * b = b^2

    fadd st1, st0   ;st1 = a^2 + b^2

    fld qword [ang] ;load angle into st0
    fcos            ;st0 = cos(ang)

    fmul qword [a]  ;st0 = cos(ang) * a
    fmul qword [b]  ;st0 = cos(ang) * a * b
    fadd st0, st0   ;st0 = cos(ang) * a * b + cos(ang) * a * b = 2(cos(ang) * a * b)

    fsubp st1, st0  ;st1 = st1 - st0 = (a^2 + b^2) - (2 * a * b * cos(ang))
                    ;and pop st0

    fsqrt           ;take square root of st0 = c

    fst qword [c]   ;store st0 in c ‒ and we're done!

When I execute the following command:

nasm -f elf32 cosineSample.s -o cosineSample.o

I get the following error for the line fmul st1, st1:

error: invalid combination of opcode and operands

What do I need to do to resolve this? Do I need to pass special arguments to nasm? Is the code sample wrong?

Solution

That code is broken unfortunately. fmul can not operate on st1, st1 but even if it did, it wouldn't do what the author wanted. As per the comment, he wanted to calculate b*b but b is in st0 at that point. The comment load b into st1 is wrong, fld always loads into st0 (the top of the stack). You need to change the fmul st1, st1 to fmul st0, st0. Furthermore, to get correct result, the following fadd st1, st0 has to be reversed as well. The code also leaves the fpu stack dirty.

Also note that program has no ending, so it will segfault unless you add an explicit exit system call.

Here is the fixed code, converted to gnu assembler syntax:

.intel_syntax noprefix

.global _start

.data
    a: .double 4.56   # length of side a
    b: .double 7.89   # length of side b
    ang: .double 1.5  # opposite angle to side c (around 85.94 degrees)

.lcomm c, 8

.text
    _start:

    fld qword ptr [a]   # load a into st0
    fmul st             # st0 = a * a = a^2

    fld qword ptr [b]   # load b into st0
    fmul st             # st0 = b * b = b^2

    faddp               # st0 = a^2 + b^2

    fld qword ptr [ang] # load angle into st0
    fcos                # st0 = cos(ang)

    fmul qword ptr [a]  # st0 = cos(ang) * a
    fmul qword ptr [b]  # st0 = cos(ang) * a * b
    fadd st             # st0 = cos(ang) * a * b + cos(ang) * a * b = 2(cos(ang) * a * b)

    fsubp               # st1 = st1 - st0 = (a^2 + b^2) - (2 * a * b * cos(ang))
                        # and pop st0

    fsqrt               # take square root of st0 = c

    fstp qword ptr [c]  # store st0 in c - and we're done!

    # end program
    mov eax, 1
    xor ebx, ebx
    int 0x80