Search code examples
performancedelphi64-bitx86-64trigonometry

Faster sin() for x64


Main question

Does somebody have a fast sin() implementation for x64? It doesn't need to be pure pascal.

Explanation

I've got a VCL application that in some situations runs a lot slower when it's compiled for x64.

It does a lot of floating point 3d calculations, and I've tracked this down to the fact that System.Sin() and System.Cos() are a lot slower on x64, when the input values become large.

I've timed it by creating a simple test application that measures how long it takes to calculate sin(x), with different values for x, and the differences are HUGE:

              call:     x64:     x86:
              Sin(1)   16 ms    20 ms
             Sin(10)   30 ms    20 ms
            Sin(100)   32 ms    20 ms
           Sin(1000)   34 ms    21 ms
          Sin(10000)   30 ms    21 ms
         Sin(100000)   30 ms    16 ms
        Sin(1000000)   35 ms    20 ms
       Sin(10000000)  581 ms    20 ms
      Sin(100000000) 1026 ms    21 ms
     Sin(1000000000) 1187 ms    22 ms
    Sin(10000000000) 1320 ms    21 ms
   Sin(100000000000) 1456 ms    20 ms
  Sin(1000000000000) 1581 ms    17 ms
 Sin(10000000000000) 1717 ms    22 ms
Sin(100000000000000) 1846 ms    23 ms
           Sin(1E15) 1981 ms    21 ms
           Sin(1E16) 2100 ms    21 ms
           Sin(1E17) 2240 ms    22 ms
           Sin(1E18) 2372 ms    18 ms
                etc    etc      etc

What you see here is that sin(1E5) runs about 300 times as fast as sin(1E8).

In case you're interested, I've created the above table like this:

{$APPTYPE CONSOLE}
program SinTest;

uses Diagnostics, Math, SysUtils;

var
  i : Integer;
  x : double;
  sw: TStopwatch;

begin
  x := 1;

  while X < 1E18 do
  begin
    sw    := TStopwatch.StartNew;
    for i := 1 to 500000 do
      System.Sin(x);

    // WriteLn(System.sin(x), #9,System.Sin(fmod(x,2*pi)));

    sw.Stop;

    WriteLn('    ', ('Sin(' + round(x).ToString + ')'):20, ' ', sw.ElapsedMilliseconds,' ms');

    x := x * 10;
  end;

  WriteLn('Press any key to continue');
  readln;
end.

Notes:

  • There are some questions on StackOverflow regarding faster sine functions, but none of them have source code that is useful to port to Delphi, like this one: Fastest implementation of sine, cosine and square root in C++ (doesn't need to be much accurate)

  • The rest of the x64 runs faster than it's 32-bit counterpart

  • I've found a bit of crappy workaround, by doing this: Sin(FMod(x,2*pi)). It provides the correct results, and it runs fast for larger numbers. For smaller numbers it's a bit slower of course.


Solution

  • While this is probably to be rather strongly discouraged in user mode code (and is totally forbidden in kernel mode code), if you do want to keep the legacy x87 behaviour in your x64 code you could write a function like this :

    function SinX87(x:double):double;
    var
      d : double;
    asm
      movsd qword ptr [rbp+8], xmm0
      fld qword ptr [rbp+8]
      fsin
      fstp qword ptr [rbp+8]
      movsd xmm0, qword ptr [rbp+8]
    end;
    

    This adds a bit of overhead since you have to pop the value out of the SSE register onto the stack, load it into the x87 unit, peform the calculation, pop the value back onto the stack, and then load it back into XMM0 for the function result. The sin calculation is pretty heavy, though, so this is a relatively small overhead. I would only really do this if you needed to preserve whatever idiosyncracies of the x87's sin implementation.

    Other libraries exist that compute sin more efficiently in x64 code than Delphi's purepascal routines. My overwhelming preference here would be to export a good set of C++ routines to a DLL instead. Also, as David said, using trig functions with ridiculously large arguments is not really a sensible thing to do anyway.