How to use native popcount with numba

I am using numba 0.57.1 and I would like to exploit the native CPU popcount in my code. My existing code is too slow as I need to run it hundreds of millions of times. Here is a MWE:

import numba as nb

def popcount(x): 
      while(x > 0):
          x &= x - nb.uint64(1)   
      return b

The current speed is:

%timeit popcount(255)
148 ns ± 0.369 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

I believe that it should be possible to use the native CPU popcount (_mm_popcnt_u64 as a C intrinsic) instruction using numba intrinsics but this area is new to me. The llvm intrinsic I need to use is, I think, ctpop. Assuming this is right, how can I do this?


  • Try:

    import numba as nb
    from numba import types
    from numba.cpython import mathimpl
    from numba.extending import intrinsic
    def popcnt(typingctx, src):
        sig = types.uint64(types.uint64)
        def codegen(context, builder, signature, args):
            return mathimpl.call_fp_intrinsic(builder, "llvm.ctpop.i64", args)
        return sig, codegen
    def popcount(x):
        return popcnt(x)