Search code examples
pythonnumpynumbasingle-precision

Binary operations on Numpy scalars automatically up-casts to float64


I want to do binary operations (like add and multiply) between np.float32 and builtin Python int and float and get a np.float32 as the return type. However, it gets automatically up-casted to a np.float64.

Example code:

>>> a = np.float32(5)
>>> a.dtype
dtype('float32')
>>> b = a + 2
>>> b.dtype
dtype('float64')

If I do this with a np.float128, b also becomes a np.float128. This is good, as it thereby preserves precision. However, no up-casting to np.float64 is necessary to preserve precision in my example, but it still occurs. Had I added 2.0 (a Python float (64 bit)) to a instead of 2, the casting would make sense. But even here, I do not want it.

So my question is: How can I alter the casting done when applying a binary operator to a np.float32 and a builtin Python int/float? Alternatively, making single precision the standard in all calculations rather than double, would also count as a solution, as I do not ever need double precision. Other people have asked for this, and it seems that no solution has been found.

I know about numpy arrays and there dtypes. Here I get the wanted behavior, as an array always preserves its dtype. It is however when I do an operation on a single element of an array that I get the unwanted behavior. I have a vague idea to a solution, involving subclassing np.ndarray (or np.float32) and changing the value of __array_priority__. So far I have not been able to get it working.

Why do I care? I am trying to write an n-body code using Numba. This is why I cannot simply do operations on the array as a whole. Changing all np.float64 to np.float32 makes for a speed up of about a factor of 2, which is important. The np.float64-casting behavior serves to ruin this speed up completely, as all operations on my np.float32 array are done in 64-precision and then downcasted to 32-precision.


Solution

  • I'm not sure about the NumPy behavior, or how exactly you're trying to use Numba, but being explicit about the Numba types might help. For example, if you do something like this:

    @jit
    def foo(a):
        return a[0] + 2;
    
    a = np.array([3.3], dtype='f4')
    foo(a)
    

    The float32 value in a[0] is promoted to a float64 before the add operation (if you don't mind diving into llvm IR, you can see this for yourself by running the code using the numba command and using the --dump-llvm or --dump-optimized flag: numba --dump-optimized numba_test.py). However, by specifying the function signature, including the return type as float32:

    @jit('f4(f4[:]'))
    def foo(a):
        return a[0] + 2;
    

    The value in a[0] is not promoted to float64, although the result is cast to a float64 so it can be converted to a Python float object when the function returns to Python land.

    If you can allocate an array beforehand to hold the result, you can do something like this:

    @jit
    def foo():
        a = np.arange(1000000, dtype='f4')
        result = np.zeros(1000000, dtype='f4')
        for i in range(a.size):
            result[0] = a[0] + 2
    

    Even though you're doing the looping yourself, the performance of the compiled code should be comparable to a NumPy ufunc, and no casts to float64 should occur (Again, this can be verified by looking at the llvm IR that Numba generates).