Search code examples
c++gccclangsseintrinsics

Clang/GCC Compiler Intrinsics without corresponding compiler flag


I know there are similar questions to this, but compiling different file with different flag is not acceptable solution here since it would complicate the codebase real quick. An answer with "No, it is not possible" will do.


Is it possible, in any version of Clang OR GCC, to compile intrinsics function for SSE 2/3/3S/4.1 while only enable compiler to use SSE instruction set for its optimization?

EDIT: For example, I want compiler to turn _mm_load_si128() to movdqa, but compiler must not do emit this instruction at any other place than this intrinsics function, similar to how MSVC compiler works.

EDIT2: I have dynamic dispatcher in place and several version of single function with different instruction sets written using intrinsics function. Using multiple file will make this much harder to maintain as same version of code will span multiple file, and there are a lot of this type of functions.

EDIT3: Example source code as requested: https://github.com/AviSynth/AviSynthPlus/blob/master/avs_core/filters/resample.cpp or most file in that folder really.


Solution

  • Here is an approach using gcc that might be acceptable. All source code goes into a single source file. The single source file is divided into sections. One section generates code according to the command line options used. Functions like main() and processor feature detection go in this section. Another section generates code according to a target override pragma. Intrinsic functions supported by the target override value can be used. Functions in this section should be called only after processor feature detection has confirmed the needed processor features are present. This example has a single override section for AVX2 code. Multiple override sections can be used when writing functions optimized for multiple targets.

    // temporarily switch target so that all x64 intrinsic functions will be available
    #pragma GCC push_options
    #pragma GCC target ("arch=core-avx2")
    #include <intrin.h>
    // restore the target selection
    #pragma GCC pop_options
    
    //----------------------------------------------------------------------------
    // the following functions will be compiled using default code generation
    //----------------------------------------------------------------------------
    
    int dummy1 (int a) {return a;}
    
    //----------------------------------------------------------------------------
    // the following functions will be compiled using core-avx2 code generation
    // all x64 intrinc functions are available
    #pragma GCC push_options
    #pragma GCC target ("arch=core-avx2")
    //----------------------------------------------------------------------------
    
    static __m256i bitShiftLeft256ymm (__m256i *data, int count)
       {
       __m256i innerCarry, carryOut, rotate;
    
       innerCarry = _mm256_srli_epi64 (*data, 64 - count);                        // carry outs in bit 0 of each qword
       rotate     = _mm256_permute4x64_epi64 (innerCarry, 0x93);                  // rotate ymm left 64 bits
       innerCarry = _mm256_blend_epi32 (_mm256_setzero_si256 (), rotate, 0xFC);   // clear lower qword
       *data    = _mm256_slli_epi64 (*data, count);                               // shift all qwords left
       *data    = _mm256_or_si256 (*data, innerCarry);                            // propagate carrys from low qwords
       carryOut   = _mm256_xor_si256 (innerCarry, rotate);                        // clear all except lower qword
       return carryOut;
       }
    
    //----------------------------------------------------------------------------
    // the following functions will be compiled using default code generation
    #pragma GCC pop_options
    //----------------------------------------------------------------------------
    
    int main (void)
        {
        return 0;
        }
    
    //----------------------------------------------------------------------------