Search code examples
pythonnumpysqrt

np.sqrt with integers and where condition returns wrong results


I am getting weird results from numpy sqrt method when applying it on an array of integers with a where condition. See below.

With integers:

a = np.array([1, 4, 9])

np.sqrt(a, where=(a>5))
Out[3]: array([0. , 0.5, 3. ])

With floats:

a = np.array([1., 4., 9.])

np.sqrt(a, where=(a>5))
Out[25]: array([1., 4., 3.])

Is it a bug or a misunderstanding of how this method works?


Solution

  • I think there might be a bug inconsistency*, when repeating the command several times I don't get consistent results.

    *this is actually due to the probable use of numpy.empty to initialize the output. This function reuses memory without setting the default values and thus will have non-predictable values in the cells that are not manually overwritten.

    Here is an example:

    import numpy as np
    print(np.__version__)
    
    a = np.array([1, 4, 9])                   # integers
    print(np.sqrt(a, where=(a>5)).round(3))
    
    a = np.array([1., 4., 9.])                # floats
    print(np.sqrt(a, where=(a>5)).round(3))
    
    a = np.array([1, 4, 9])                   # integers again
    print(np.sqrt(a, where=(a>5)).round(3))
    

    Output:

    1.24.2
    [0. 0. 3.]
    [0. 0. 3.]
    [1. 4. 3.]  # this is now different
    

    From the remark of @hpaulj, it seems that providing out is required. Indeed, this prevents the inconsistent behavior with the above example.

    out should be initialized with a predictable function, such as numpy.zeros.

    import numpy as np
    print(np.__version__)
    
    out = np.zeros(3)
    a = np.array([1, 4, 9])      # integers
    print(np.sqrt(a, where=(a>5), out=out))
    print(out)
    
    out = np.zeros(3)
    a = np.array([1., 4., 9.])   # floats
    print(np.sqrt(a, where=(a>5), out=out))
    print(out)
    
    a = np.array([1, 4, 9])      # integers again
    print(np.sqrt(a, where=(a>5), out=out))
    print(out)
    

    Output:

    1.24.2
    [0. 0. 3.]
    [0. 0. 3.]
    [0. 0. 3.]
    [0. 0. 3.]
    [0. 0. 3.]
    [0. 0. 3.]
    

    Nevertheless, this seems to be an inconsistent behavior.

    The doc specifies that it out is not provided, a new array should be allocated:

    out ndarray, None, or tuple of ndarray and None, optional

    A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned. A tuple (possible only as a keyword argument) must have length equal to the number of outputs.

    where array_like, optional

    This condition is broadcast over the input. At locations where the condition is True, the out array will be set to the ufunc result. Elsewhere, the out array will retain its original value. Note that if an uninitialized out array is created via the default out=None, locations within it where the condition is False will remain uninitialized.

    In your case you rather want to use numpy.where:

    out = np.where((a>5), np.sqrt(a), a)
    
    # or, to avoid potential errors if you have forbidden values
    out = np.where(a>5, np.sqrt(a, where=a>5), a)
    
    # or
    out = np.sqrt(a, where=a>5, out=a.astype(float))
    

    Output: array([1., 4., 3.])