Search code examples
gccsimdauto-vectorization

-ftree-vectorize option in GNU


With the GCC compiler, the -ftree-vectorize option turns on auto-vectorization, and this flag is automatically set when using -O3. To what level does it vectorize? I.e., will I get SSE2, SSE4.2, AVX, or AVX2 instructions? I know of the existence of the mavx, mavx2 flags, etc., but I want to know what the compiler is doing without those specific flags to force a particular type of vectorization.


Solution

  • All x86 64-bit processors have at least SSE2. The GCC compiler will default to SSE2 code in 64-bit mode unless you tell it to use other hardware options.

    For 32-bit mode GCC may use x87 instructions which are not SIMD instructions so to enable vectorization make sure to enable at least SSE with -mfpmath=sse -msse2.

    If you enable higher SIMD options then the compiler may (and in many cases will) use those new instructions when vectorizing.

    I believe this is true as well with Clang. However, ICC and MSVC do things differently. ICC may create a CPU dispatcher to select the best hardware (or to veto AMD hardware). MSVC only has options for enabling AVX and AVX2 in 64-bit mode (SSE2 is assumed). There is no way to explicitly enable e.g. SSE4.1 with MSVC. Instead in some cases the auto-vectorizer will add code to check for SSE4.1 (but not AVX) and use those instructions. GCC will only use SSE4.1 if you tell it to e.g with -msse4.1 or something higher such as -mavx.