Adding two floating-point numbers

I would like to compute the sum, rounded up, of two IEEE 754 binary64 numbers. To that end I wrote the C99 program below:

#include <stdio.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON

int main(int c, char *v[]){
  fesetround(FE_UPWARD);
  printf("%a\n", 0x1.0p0 + 0x1.0p-80);
}

However, if I compile and run my program with various compilers:

$ gcc -v
…
gcc version 4.2.1 (Apple Inc. build 5664)
$ gcc -Wall -std=c99 add.c && ./a.out 
add.c:3: warning: ignoring #pragma STDC FENV_ACCESS
0x1p+0
$ clang -v
Apple clang version 1.5 (tags/Apple/clang-60)
Target: x86_64-apple-darwin10
Thread model: posix
$ clang -Wall -std=c99 add.c && ./a.out 
add.c:3:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring
      pragma [-Wunknown-pragmas]
#pragma STDC FENV_ACCESS ON
             ^
1 warning generated.
0x1p+0

It doesn't work! (I expected the result 0x1.0000000000001p0).

Indeed, the computation was done at compile-time in the default round-to-nearest mode:

$ clang -Wall -std=c99 -S add.c && cat add.s
add.c:3:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring
      pragma [-Wunknown-pragmas]
#pragma STDC FENV_ACCESS ON
             ^
1 warning generated.
…
LCPI1_0:
    .quad   4607182418800017408
…
    callq   _fesetround
    movb    $1, %cl
    movsd   LCPI1_0(%rip), %xmm0
    leaq    L_.str(%rip), %rdx
    movq    %rdx, %rdi
    movb    %cl, %al
    callq   _printf
…
L_.str:
    .asciz   "%a\n"

Yes, I did see the warning emitted by each compiler. I understand that turning the applicable optimizations on or off at the scale of the line may be tricky. I would still like, if that was at all possible, to turn them off at the scale of the file, which would be enough to resolve my question.

My question is: what command-line option(s) should I use with GCC or Clang so as to compile a C99 compilation unit that contains code intended to be executed with an FPU rounding mode other than the default?

Digression

While researching this question, I found this GCC C99 compliance page, containing the entry below, that I will just leave here in case someone else finds it funny. Grrrr.

floating-point      |     |
environment access  | N/A | Library feature, no compiler support required.
in <fenv.h>         |     |

Solution

clang or gcc -frounding-math tells them that code might run with a non-default rounding mode. It's not fully safe (it assumes the same rounding mode is active the whole time), but better than nothing. You might still need to use volatile to avoid CSE in some cases, or maybe the noinline wrapper trick from the other answer which in practice may work even better if you limit it to a single operation.

As you noticed, GCC doesn't support #pragma STDC FENV_ACCESS ON. The default behaviour is like FENV_ACCESS OFF. Instead, you have to use command line options (or maybe per-function attributes) to control FP optimizations.

As described in https://gcc.gnu.org/wiki/FloatingPointMath, -frounding-math is not on by default, so GCC assumes the default rounding mode when doing constant propagation and other optimizations at compile-time.

But with gcc -O3 -frounding-math, constant propagation is blocked. Even if you don't call fesetround; what's actually happening is that GCC makes asm that's safe if the rounding mode had already been set to something else before main was even called.

But unfortunately, as the wiki notes, GCC still assumes that the same rounding mode is in effect everywhere (GCC bug #34678). That means it will CSE two calculations of the same inputs before/after a call to fesetround, because it doesn't treat fesetround as special.

#include <fenv.h>
#pragma STDC FENV_ACCESS ON

void foo(double *restrict out){
    out[0] = 0x1.0p0 + 0x1.0p-80;
    fesetround(FE_UPWARD);
    out[1] = 0x1.0p0 + 0x1.0p-80;
}

compiles as follows (Godbolt) with gcc10.2 (and essentially the same with clang10.1). Also includes your main, which does make the asm you want.

foo:
        push    rbx
        mov     rbx, rdi
        sub     rsp, 16
        movsd   xmm0, QWORD PTR .LC1[rip]
        addsd   xmm0, QWORD PTR .LC0[rip]     # runtime add
        movsd   QWORD PTR [rdi], xmm0         # store out[0]
        mov     edi, 2048
        movsd   QWORD PTR [rsp+8], xmm0       # save a local temporary for later
        call    fesetround
        movsd   xmm0, QWORD PTR [rsp+8]
        movsd   QWORD PTR [rbx+8], xmm0       # store the same value, not recalc
        add     rsp, 16
        pop     rbx
        ret

This is the same problem @Marc Glisse warned about in comments under the other answer in case your noinline function did the same math before and after changing the rounding mode.

(And also that it's partly luck that GCC chose not to do the math before calling fesetround the first time, so it would only have to spill the result instead of both inputs. x86-64 System V doesn't have any call-preserved XMM regs.)