Search code examples
floating-pointtrigonometryfloating-accuracyamd-processorx87

Accuracy of FSIN and other x87 trigonometric instructions on AMD processors


On Intel processors, x87 trigonometric instructions such as FSIN have limited accuracy due to the use of a 66-bit approximation of pi even though the computation itself is otherwise accurate to the full 64-bit mantissa of an 80-bit extended-precision floating-point value. (Full accuracy for all valid inputs requires a 128-bit approximation of pi.) The omission in Intel's documentation was corrected after the issue was brought to their attention.

However, I cannot find similarly detailed information about the accuracy of AMD's implementation of x87 trigonometric instructions beyond this mention in the AMD64 Architecture Programmer's Manual, Volume 1:

6.4.5.1 Accuracy of Transcendental Results

x87 computations are carried out in double-extended-precision format, so that the transcendental functions provide results accurate to within one unit in the last place (ulp) for each of the floating-point data types.

Is AMD's implementation of x87 trigonometric instructions actually fully accurate to within one ULP in extended-precision format for all valid inputs, including a 128-bit or better approximation of pi? An answer that pertains to the Zen and Zen 2 architectures (Ryzen and EPYC) would be ideal.


Solution

  • I found a program located at http://notabs.org/fpuaccuracy/ (direct download link; GPLv3) designed to test the accuracy of x87 trigonometric instructions. The reference output for fpuaccuracy examples supplied with the program, generated using an Intel Core i7-2600 (Sandy Bridge), is as follows:

    sin with smallest failing argument
    argument   4000 C10A 7DC0 DC46 D753   (decimal 3.0162653335001840718)
    actual     3FFB FFFF BBF1 3588 24AF   (decimal 0.1249994929300478145)
    x87 fpu    3FFB FFFF BBF1 3588 24AE   (decimal 0.12499949293004781449)
    error      -1.0002171407788819287 ulp
    
    sin near pi
    argument   4000 C90F DAA2 2168 C235   (decimal 3.1415926535897932385)
    actual     BFBE ECE6 75D1 FC8F 8CBB   (decimal -5.0165576126683320235E-20)
    x87 fpu    BFBF 8000 0000 0000 0000   (decimal -5.42101086242752217E-20)
    error      -1376283091369227076.6 ulp
    
    sin with large argument
    argument   403D FFFF FFFF 2D2A 9042   (decimal 9223372035086174241)
    actual     BFDF E730 CF55 1180 63F3   (decimal -4.2053336735954077951E-10)
    x87 fpu    BFF8 C28B 4641 7452 B463   (decimal -0.011874025925697012908)
    error      -4.7037861121081250351E+26 ulp
    
    cos with smallest failing argument
    argument   3FFF C10E 8AC0 BFEB 5E80   (decimal 1.5082562867317745453)
    actual     3FFA FFFF 3EA3 D2D7 355B   (decimal 0.062499279677629184442)
    x87 fpu    3FFA FFFF 3EA3 D2D7 355A   (decimal 0.062499279677629184438)
    error      -1.005468872258621479 ulp
    
    cos near pi/2
    argument   3FFF C90F DAA2 2168 C235   (decimal 1.5707963267948966193)
    actual     BFBD ECE6 75D1 FC8F 8CBB   (decimal -2.5082788063341660117E-20)
    x87 fpu    BFBE 8000 0000 0000 0000   (decimal -2.710505431213761085E-20)
    error      -1376283091369227076.6 ulp
    
    cos with large argument
    argument   403D FFFF FFFF 6CE1 B432   (decimal 9223372035620657689)
    actual     3FDD DFD2 E369 AE25 7E4A   (decimal 1.0178327217734091432E-10)
    x87 fpu    BFF8 C28B 45B2 1490 D117   (decimal -0.011874025404105249357)
    error      -1.8815144449581111989E+27 ulp
    
    tan with smallest failing argument
    argument   3FFF B8B5 07B4 294A BD53   (decimal 1.4430245999997931928)
    actual     4001 F915 0EE5 BAC8 446C   (decimal 7.7838205801874740721)
    x87 fpu    4001 F915 0EE5 BAC8 446D   (decimal 7.7838205801874740726)
    error      1.0017725812707024772 ulp
    
    tan near pi/2
    argument   3FFF C90F DAA2 2168 C235   (decimal 1.5707963267948966193)
    actual     C040 8A51 E04D AABD A35F   (decimal -39867976298117107068)
    x87 fpu    C040 8000 0000 0000 0000   (decimal -36893488147419103232)
    error      743622037674500958.81 ulp
    
    tan with large argument
    argument   403D FFFF FFFF DCF6 FE38   (decimal 9223372036560879388)
    actual     4005 A86C 499C 14EA BD4A   (decimal 84.211499097398127292)
    x87 fpu    401F C10C D618 50D5 E957   (decimal 6477687856.6315280604)
    error      9.3353319161898434351E+26 ulp
    

    When run on a laptop with an AMD Ryzen 7 2700U (Zen), I get the following:

    sin with smallest failing argument
    argument   4000 C10A 7DC0 DC46 D753   (decimal 3.0162653335001840718)
    actual     3FFB FFFF BBF1 3588 24AF   (decimal 0.1249994929300478145)
    x87 fpu    3FFB FFFF BBF1 3588 24AE   (decimal 0.12499949293004781449)
    error      -1.0002171407788819287 ulp
    
    sin near pi
    argument   4000 C90F DAA2 2168 C235   (decimal 3.1415926535897932385)
    actual     BFBE ECE6 75D1 FC8F 8CBB   (decimal -5.0165576126683320235E-20)
    x87 fpu    BFBF 8000 0000 0000 0000   (decimal -5.42101086242752217E-20)
    error      -1376283091369227076.6 ulp
    
    sin with large argument
    argument   403D FFFF FFFF 2D2A 9042   (decimal 9223372035086174241)
    actual     BFDF E730 CF55 1180 63F3   (decimal -4.2053336735954077951E-10)
    x87 fpu    BFF8 C28B 4641 7452 B463   (decimal -0.011874025925697012908)
    error      -4.7037861121081250351E+26 ulp
    
    cos with smallest failing argument
    argument   3FFF C10E 8AC0 BFEB 5E80   (decimal 1.5082562867317745453)
    actual     3FFA FFFF 3EA3 D2D7 355B   (decimal 0.062499279677629184442)
    x87 fpu    3FFA FFFF 3EA3 D2D7 355A   (decimal 0.062499279677629184438)
    error      -1.005468872258621479 ulp
    
    cos near pi/2
    argument   3FFF C90F DAA2 2168 C235   (decimal 1.5707963267948966193)
    actual     BFBD ECE6 75D1 FC8F 8CBB   (decimal -2.5082788063341660117E-20)
    x87 fpu    BFBE 8000 0000 0000 0000   (decimal -2.710505431213761085E-20)
    error      -1376283091369227076.6 ulp
    
    cos with large argument
    argument   403D FFFF FFFF 6CE1 B432   (decimal 9223372035620657689)
    actual     3FDD DFD2 E369 AE25 7E4A   (decimal 1.0178327217734091432E-10)
    x87 fpu    BFF8 C28B 45B2 1490 D117   (decimal -0.011874025404105249357)
    error      -1.8815144449581111989E+27 ulp
    
    tan with smallest failing argument
    argument   3FFF B8B5 07B4 294A BD53   (decimal 1.4430245999997931928)
    actual     4001 F915 0EE5 BAC8 446C   (decimal 7.7838205801874740721)
    x87 fpu    4001 F915 0EE5 BAC8 446C   (decimal 7.7838205801874740721)
    error      0.0017725812707024772387 ulp
    
    tan near pi/2
    argument   3FFF C90F DAA2 2168 C235   (decimal 1.5707963267948966193)
    actual     C040 8A51 E04D AABD A35F   (decimal -39867976298117107068)
    x87 fpu    C040 8000 0000 0000 0000   (decimal -36893488147419103232)
    error      743622037674500958.81 ulp
    
    tan with large argument
    argument   403D FFFF FFFF DCF6 FE38   (decimal 9223372036560879388)
    actual     4005 A86C 499C 14EA BD4A   (decimal 84.211499097398127292)
    x87 fpu    401F C10C D618 50D5 E957   (decimal 6477687856.6315280604)
    error      9.3353319161898434351E+26 ulp
    

    With one exception (tan with smallest failing argument), the results are identical. I also tested on my Ryzen 9 3950X (Zen 2) and got the same results.

    In conclusion, recent AMD processors, including the Zen and Zen 2 architectures, use a 66-bit approximation of pi and will produce the same kinds of inaccuracies modern Intel processors give for x87 trigonometric instructions when given certain arguments.