Search code examples
gccclangsseavxavx512

How to detect SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI availability at compile-time?


I'm trying to optimize some matrix computations and I was wondering if it was possible to detect at compile-time if SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI[1] is enabled by the compiler ? Ideally for GCC and Clang, but I can manage with only one of them.

I'm not sure it is possible and perhaps I will use my own macro, but I'd prefer detecting it rather and asking the user to select it.


[1] "KCVI" stands for Knights Corner Vector Instruction optimizations. Libraries like FFTW detect/utilize these newer instruction optimizations.


Solution

  • Most compilers will automatically define:

    __SSE__
    __SSE2__
    __SSE3__
    __AVX__
    __AVX2__
    

    etc, according to whatever command line switches you are passing. You can easily check this with gcc (or gcc-compatible compilers such as clang), like this:

    $ gcc -msse3 -dM -E - < /dev/null | egrep "SSE|AVX" | sort
    #define __SSE__ 1
    #define __SSE2__ 1
    #define __SSE2_MATH__ 1
    #define __SSE3__ 1
    #define __SSE_MATH__ 1
    

    or:

    $ gcc -mavx2 -dM -E - < /dev/null | egrep "SSE|AVX" | sort
    #define __AVX__ 1
    #define __AVX2__ 1
    #define __SSE__ 1
    #define __SSE2__ 1
    #define __SSE2_MATH__ 1
    #define __SSE3__ 1
    #define __SSE4_1__ 1
    #define __SSE4_2__ 1
    #define __SSE_MATH__ 1
    #define __SSSE3__ 1
    

    or to just check the pre-defined macros for a default build on your particular platform:

    $ gcc -dM -E - < /dev/null | egrep "SSE|AVX" | sort
    #define __SSE2_MATH__ 1
    #define __SSE2__ 1
    #define __SSE3__ 1
    #define __SSE_MATH__ 1
    #define __SSE__ 1
    #define __SSSE3__ 1
    

    More recent Intel processors support AVX-512, which is not a monolithic instruction set. One can see the support available from GCC (version 6.2) for two examples below.

    Here is Knights Landing:

    $ gcc -march=knl -dM -E - < /dev/null | egrep "SSE|AVX" | sort
    #define __AVX__ 1
    #define __AVX2__ 1
    #define __AVX512CD__ 1
    #define __AVX512ER__ 1
    #define __AVX512F__ 1
    #define __AVX512PF__ 1
    #define __SSE__ 1
    #define __SSE2__ 1
    #define __SSE2_MATH__ 1
    #define __SSE3__ 1
    #define __SSE4_1__ 1
    #define __SSE4_2__ 1
    #define __SSE_MATH__ 1
    #define __SSSE3__ 1
    

    Here is Skylake AVX-512:

    $ gcc -march=skylake-avx512 -dM -E - < /dev/null | egrep "SSE|AVX" | sort
    #define __AVX__ 1
    #define __AVX2__ 1
    #define __AVX512BW__ 1
    #define __AVX512CD__ 1
    #define __AVX512DQ__ 1
    #define __AVX512F__ 1
    #define __AVX512VL__ 1
    #define __SSE__ 1
    #define __SSE2__ 1
    #define __SSE2_MATH__ 1
    #define __SSE3__ 1
    #define __SSE4_1__ 1
    #define __SSE4_2__ 1
    #define __SSE_MATH__ 1
    #define __SSSE3__ 1
    

    Intel has disclosed additional AVX-512 subsets (see ISA extensions). GCC (version 7) supports compiler flags and preprocessor symbols associated with the 4FMAPS, 4VNNIW, IFMA, VBMI and VPOPCNTDQ subsets of AVX-512:

    for i in 4fmaps 4vnniw ifma vbmi vpopcntdq ; do echo "==== $i ====" ; gcc -mavx512$i -dM -E - < /dev/null | egrep "AVX512" | sort ; done
    ==== 4fmaps ====
    #define __AVX5124FMAPS__ 1
    #define __AVX512F__ 1
    ==== 4vnniw ====
    #define __AVX5124VNNIW__ 1
    #define __AVX512F__ 1
    ==== ifma ====
    #define __AVX512F__ 1
    #define __AVX512IFMA__ 1
    ==== vbmi ====
    #define __AVX512BW__ 1
    #define __AVX512F__ 1
    #define __AVX512VBMI__ 1
    ==== vpopcntdq ====
    #define __AVX512F__ 1
    #define __AVX512VPOPCNTDQ__ 1
    

    Note that the SSE macros won't work with Visual C++. You have to use _M_IX86_FP instead.