Search code examples
cpu-architectureieee-754

Additive inverse of a number: subtraction from zero or multiplication by −1


Which method is faster when negating a number: -1*a or 0-a? where a is a double.


Solution

  • Both are terrible, just flip the sign bit with an XOR or a dedicated FP negation instruction.

    IEEE-754 floating point uses a sign/magnitude representation so -x differs from x by exactly 1 bit: the sign-bit. (e.g. on x86 with SSE, using xorps How to negate (change sign) of the floating point elements in a __m128 type variable?). This flips NaN to -NaN or vice versa without changing the payload.

    In C, write it as -a and see what your compiler does.

    Even better, you can often optimize away a negation by later doing a subtract instead of add, or an FMSUB or FNMADD instead of FMADD, or producing a originally with an FNMSUB instead of FMADD to negate as part of the FMA.


    But if you had to choose between an actual FP multiple or FP add instruction, normally subtraction has latency at least as good as multiplications.

    Intel Haswell and Broadwell have multiply throughput twice as good as add throughput (running on FMA units with worse or equal latency to add), but most microarchitectures (including modern x86 Ryzen and Skylake) have balanced FP add vs. multiply throughput.

    In general for non-x86 architectures, generally add will be at least as cheap as multiply. But again, most ISAs will have some special way of negating like x86's SSE1 xorps or legacy x87 fchs (CHange Sign).

    Boolean AND or ANDN (or a dedicated instruction that has the "mask" built in) to unconditionally clear the sign bit is also useful as an absolute value.