Search code examples
windowsx86-64trigonometrysseavx

Is it possible to get multiple sines in AVX/SSE?


I'm trying to write a C++ program, which launches a function I write in x64 assembler. I'd like to speed things up a little (and play with CPU features), so I chose to use vector operations.

The problem is, I have to multiply sines by an integer, so I have to calculate the sines first. Is it possible to do this in SSE/AVX? I'm aware of instruction fsin, but not only it is in FPU, but also it calculates only 1 sine at once. So I'd have to push it in FPU, call fsin, pop it from FPU to memory, and then put it in AVX register. It seems to me it's not worth the hassle.


Solution

  • Yes, there is a vector version using SSE/AVX! But the catch is that Intel C++ compiler must be used.

    This is called Intel small vector math library (intrinsics):

    for 128bit SSE please use (double precision): _mm_sin_pd

    for 256bit AVX please use (double precision): _mm256_sin_pd

    The two intrinsics are actually very small functions consists of hand written SSE/AVX assemblies, and now you can process 4 sine calculations at once by using AVX :=) the latency is about ~10 clock cycles (if I remember correctly) on Haswell CPU.

    By the way, the CPU needs to execute about 100 such intrinsics to warm up and reach its peak performance, if there is only a few sin functions needs to be evaluated, it's better to use plain sin() instead.

    Good luck!!