Search code examples
pythonpython-2.7mathnumpy32-bit

Bit hack with NumPy


I was trying to implement the generalized version of the fast inverse square root that I found here and this is what I came up with so far:

import numpy as np

def get_K(exponent, B=127, L=2**23, sigma=0.0450465, f=np.float32):
    return f((1 - exponent) * L * (B - f(sigma)))

def get_result(exponent, B=127, L=2**23, sigma=0.0450465, f=np.float32):
    K = f(get_K(exponent, 127, 2**23, f(0.0450465)))
    return lambda num: (K + f(num*exponent))

if __name__ == '__main__':
    print((get_result(0.5)(2)).astype(np.int32))

but when I run the above example, I get 532487680, which is the same result that I get in numpy.float32 representation for get_result(0.5)(2).

What am I doing wrong? In other words, how do I go from treating the number from a 32-bit floating point number to a 32-bit integer in the same way that I would in C, using numpy?


Solution

  • The following fast inverse square root implementation could be used with numpy (adapted from [1] ),

    def fast_inv_sqrt(x):
        x = x.astype('float32')
        x2 = x * 0.5;
        y = x.view(dtype='int32')
        y = 0x5f3759df - np.right_shift(y, 1)
        y = y.view(dtype='float32')
        y  = y * ( 1.5 - ( x2 * y * y ) )
        return y
    

    now since numpy would be allocating a number of temporary arrays this is not very fast,

     import numpy as np
    
     x = np.array(1,10000, dtype='float32')
    
     %timeit fast_inv_sqrt(x)
     # 10000 loops, best of 3: 36.2 µs per loop
    
     %timeit 1./np.sqrt(x)
     # 10000 loops, best of 3: 13.1 µs per loop
    

    If you require speed, you should rather perform this calculation in C, and write a python interface using Cython, f2py, etc.