How can I create a library that will dynamically switch between SSE, AVX, and AVX2 code paths depending on the host processor/OS? I am using Agner Fog's VCL (Vector Class Library) and compiling with GCC for Linux.
See the section "Instruction sets and CPU dispatching" in the manual to the Vector Class Library. In that section Agner writes
The file dispatch_example.cpp shows an example of how to make a CPU dispatcher that selects the appropriate code version.
Read the source code to distpatch_example.cpp
. At the start of the file you should see the comment
# Compile dispatch_example.cpp five times for different instruction sets:
| g++ -O3 -msse2 -c dispatch_example.cpp -od2.o
| g++ -O3 -msse4.1 -c dispatch_example.cpp -od5.o
| g++ -O3 -mavx -c dispatch_example.cpp -od7.o
| g++ -O3 -mavx2 -c dispatch_example.cpp -od8.o
| g++ -O3 -mavx512f -c dispatch_example.cpp -od9.o
| g++ -O3 -msse2 -otest instrset_detect.cpp d2.o d5.o d7.o d8.o d9.o
| ./test
The file instrset_detect.cpp
. You should read the source code to this also. This is what calls CPUID.
Here is a summary of some, but not all of, my questions and answers on CPU dispatchers.