I would like to compute the sum, rounded up, of two IEEE 754 binary64 numbers. To that end I wrote the C99 program below:
#include <stdio.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON
int main(int c, char *v[]){
fesetround(FE_UPWARD);
printf("%a\n", 0x1.0p0 + 0x1.0p-80);
}
However, if I compile and run my program with various compilers:
$ gcc -v … gcc version 4.2.1 (Apple Inc. build 5664) $ gcc -Wall -std=c99 add.c && ./a.out add.c:3: warning: ignoring #pragma STDC FENV_ACCESS 0x1p+0 $ clang -v Apple clang version 1.5 (tags/Apple/clang-60) Target: x86_64-apple-darwin10 Thread model: posix $ clang -Wall -std=c99 add.c && ./a.out add.c:3:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring pragma [-Wunknown-pragmas] #pragma STDC FENV_ACCESS ON ^ 1 warning generated. 0x1p+0
It doesn't work! (I expected the result 0x1.0000000000001p0
).
Indeed, the computation was done at compile-time in the default round-to-nearest mode:
$ clang -Wall -std=c99 -S add.c && cat add.s add.c:3:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring pragma [-Wunknown-pragmas] #pragma STDC FENV_ACCESS ON ^ 1 warning generated. … LCPI1_0: .quad 4607182418800017408 … callq _fesetround movb $1, %cl movsd LCPI1_0(%rip), %xmm0 leaq L_.str(%rip), %rdx movq %rdx, %rdi movb %cl, %al callq _printf … L_.str: .asciz "%a\n"
Yes, I did see the warning emitted by each compiler. I understand that turning the applicable optimizations on or off at the scale of the line may be tricky. I would still like, if that was at all possible, to turn them off at the scale of the file, which would be enough to resolve my question.
My question is: what command-line option(s) should I use with GCC or Clang so as to compile a C99 compilation unit that contains code intended to be executed with an FPU rounding mode other than the default?
While researching this question, I found this GCC C99 compliance page, containing the entry below, that I will just leave here in case someone else finds it funny. Grrrr.
floating-point | | environment access | N/A | Library feature, no compiler support required. in <fenv.h> | |
clang or gcc -frounding-math
tells them that code might run with a non-default rounding mode. It's not fully safe (it assumes the same rounding mode is active the whole time), but better than nothing. You might still need to use volatile
to avoid CSE in some cases, or maybe the noinline wrapper trick from the other answer which in practice may work even better if you limit it to a single operation.
As you noticed, GCC doesn't support #pragma STDC FENV_ACCESS ON
. The default behaviour is like FENV_ACCESS OFF
. Instead, you have to use command line options (or maybe per-function attributes) to control FP optimizations.
As described in https://gcc.gnu.org/wiki/FloatingPointMath, -frounding-math
is not on by default, so GCC assumes the default rounding mode when doing constant propagation and other optimizations at compile-time.
But with gcc -O3 -frounding-math
, constant propagation is blocked. Even if you don't call fesetround
; what's actually happening is that GCC makes asm that's safe if the rounding mode had already been set to something else before main was even called.
But unfortunately, as the wiki notes, GCC still assumes that the same rounding mode is in effect everywhere (GCC bug #34678). That means it will CSE two calculations of the same inputs before/after a call to fesetround
, because it doesn't treat fesetround
as special.
#include <fenv.h>
#pragma STDC FENV_ACCESS ON
void foo(double *restrict out){
out[0] = 0x1.0p0 + 0x1.0p-80;
fesetround(FE_UPWARD);
out[1] = 0x1.0p0 + 0x1.0p-80;
}
compiles as follows (Godbolt) with gcc10.2 (and essentially the same with clang10.1). Also includes your main
, which does make the asm you want.
foo:
push rbx
mov rbx, rdi
sub rsp, 16
movsd xmm0, QWORD PTR .LC1[rip]
addsd xmm0, QWORD PTR .LC0[rip] # runtime add
movsd QWORD PTR [rdi], xmm0 # store out[0]
mov edi, 2048
movsd QWORD PTR [rsp+8], xmm0 # save a local temporary for later
call fesetround
movsd xmm0, QWORD PTR [rsp+8]
movsd QWORD PTR [rbx+8], xmm0 # store the same value, not recalc
add rsp, 16
pop rbx
ret
This is the same problem @Marc Glisse warned about in comments under the other answer in case your noinline function did the same math before and after changing the rounding mode.
(And also that it's partly luck that GCC chose not to do the math before calling fesetround
the first time, so it would only have to spill the result instead of both inputs. x86-64 System V doesn't have any call-preserved XMM regs.)