Search code examples
x86disassemblydynamic-analysis

Floating Point Instructions in x86 Disassembly of PolyBench Suite


I am trying to count the number of dynamic floating-point instructions executed by the CPU in the binary created by GCC for the 30 different programs from the poly bench benchmark using the pin tool. All the floating instructions in x86 are under the X87_ALU category as per the x86 Encoder-Decoder (XED) documentation.

For some reason, I am getting it to be zero for all of the programs unlike all other instruction categories like Binary, load, store, nop, etc. I dissembled the binary using objdump and can't see a single line with an opcode starting from f.

Also, I produced web assembly (.wasm) binaries for all the programs using emscripten (emcc) and later converted the .wasm binary to dissembled .wat file. In those files too, I don't see any floating-point instructions.

PS: From the google searches I have been doing, I understand that x86 has a whole different floating-point unit and stack-based handling for these. Maybe I am missing something on this front?

Any lead on how to see the floating-point instructions in the dissembled binary?


Solution

  • For almost all modern code, FPU is not used and scalar SSE is used instead.

    Reasons to use FPU (rather than scalar SSE) are:

    • the same binary has to support CPUs from 20 years ago that don't support SSE. This implies that it's 32-bit code and not 64-bit code (because CPUs that are too old for SSE don't support 64-bit either).
    • using SSE hurts performance due to increased task switch costs (the cost of saving/loading SSE state during task switches). This does not apply for most cases; either because the operating system saves/loads the SSE state regardless of whether it was used or not; or because SSE is used for other things anyway (e.g. SIMD).
    • you need the extended precision of 80-bit floating point. This mostly doesn't happen - there's a small niche between "64-bit (or less) is enough" and "80 bit is not enough".
    • you need to do things like sin() and sqrt(), or use BCD, but code size is significantly more important than performance. This is extremely unlikely.