I would like to use my CPU's builtin instructions from within Numba compiled functions, but am having trouble figuring out how to reference them. For example, the popcnt instruction from the SSE4 instruction set, I can confirm I have it using
llvmlite.binding.get_host_cpu_features()
, but have no way of calling the functions itself.
I need to be able to call these functions (instructions) from within other nopython compiled functions.
Ideally this would be done as closely to Python as possible, but in this case speed is more important that readability.
You can use Cython to call SSE intrinsics, but you cannot use Numba to do it. Code doing what you want via Cython is here: https://gist.github.com/aldro61/f604a3fa79b3dec5436a and here: https://gist.github.com/craffel/e470421958cad33df550