Search code examples
pythonnumpyarray-broadcastingnumpy-ufunc

Numpy: Finding minimum and maximum values from associations through binning


Prerequisite

This is a question derived from this post. So, some of the introduction of the problem will be similar to that post.

Problem

Let's say result is a 2D array and values is a 1D array. values holds some values associated with each element in result. The mapping of an element in values to result is stored in x_mapping and y_mapping. A position in result can be associated with different values. Now, I have to find the minimum and maximum of the values grouped by associations.

An example for better clarification.

min_result array:

[[0, 0],
[0, 0],
[0, 0],
[0, 0]]

max_result array:

[[0, 0],
[0, 0],
[0, 0],
[0, 0]]

values array:

[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.]

Note: Here result arrays and values have the same number of elements. But it might not be the case. There is no relation between the sizes at all.

x_mapping and y_mapping have mappings from 1D values to 2D result(both min and max). The sizes of x_mapping, y_mapping and values will be the same.

x_mapping - [0, 1, 0, 0, 0, 0, 0, 0]

y_mapping - [0, 3, 2, 2, 0, 3, 2, 1]

Here, 1st value(values[0]) and 5th value(values[4]) have x as 0 and y as 0(x_mapping[0] and y_mappping[0]) and hence associated with result[0, 0]. If we compute the minimum and maximum from this group, we will have 1 and 5 as results respectively. So, min_result[0, 0] will have 1 and max_result[0, 0] will have 5.

Note that if there is no association at all then the default value for result will be zero.

Current working solution

x_mapping = np.array([0, 1, 0, 0, 0, 0, 0, 0])
y_mapping = np.array([0, 3, 2, 2, 0, 3, 2, 1])
values = np.array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.], dtype=np.float32)
max_result = np.zeros([4, 2], dtype=np.float32)
min_result = np.zeros([4, 2], dtype=np.float32) 
min_result[-y_mapping, x_mapping] = values # randomly initialising from values
for i in range(values.size):
    x = x_mapping[i]
    y = y_mapping[i]
    # maximum
    if values[i] > max_result[-y, x]:
        max_result[-y, x] = values[i]
    # minimum
    if values[i] < min_result[-y, x]:
        min_result[-y, x] = values[i]

min_result,

[[1., 0.],
[6., 2.],
[3., 0.],
[8., 0.]]

max_result,

[[5., 0.],
[6., 2.],
[7., 0.],
[8., 0.]]

Failed solutions

#1

min_result = np.zeros([4, 2], dtype=np.float32)
np.minimum.reduceat(values, [-y_mapping, x_mapping], out=min_result)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-126de899a90e> in <module>()
1 min_result = np.zeros([4, 2], dtype=np.float32)
----> 2 np.minimum.reduceat(values, [-y_mapping, x_mapping], out=min_result)

ValueError: object too deep for desired array

#2

min_result = np.zeros([4, 2], dtype=np.float32)
np.minimum.reduceat(values, lidx, out= min_result)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-07e8c75ccaa5> in <module>()
1 min_result = np.zeros([4, 2], dtype=np.float32)
----> 2 np.minimum.reduceat(values, lidx, out= min_result)

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (4,2)->(4,) (8,)->() (8,)->(8,) 

#3

lidx = ((-y_mapping) % 4) * 2 + x_mapping #from mentioned post
min_result = np.zeros([8], dtype=np.float32)
np.minimum.reduceat(values, lidx, out= min_result).reshape(4,2)

[[1., 4.],
[5., 5.],
[1., 3.],
[5., 7.]]

Question

How to use np.minimum.reduceat and np.maximum.reduceat for solving this problem? I'm looking for a solution that is optimised for runtime.

Side note

I'm using Numpy version 1.14.3 with Python 3.5.2


Solution

  • Approach #1

    Again, the most intuitive ones would be with numpy.ufunc.at. Now, since, these reductions would be performed against the existing values, we need to initialize the output with max values for minimum reductions and min values for maximum ones. Hence, the implementation would be -

    min_result[-y_mapping, x_mapping] = values.max()
    max_result[-y_mapping, x_mapping] = values.min()
    
    np.minimum.at(min_result, [-y_mapping, x_mapping], values)
    np.maximum.at(max_result, [-y_mapping, x_mapping], values)
    

    Approach #2

    To leverage np.ufunc.reduceat, we need to sort data -

    m,n = max_result.shape
    out_dtype = max_result.dtype
    lidx = ((-y_mapping)%m)*n + x_mapping
    
    sidx = lidx.argsort()
    idx = lidx[sidx]
    val = values[sidx]
    
    m_idx = np.flatnonzero(np.r_[True,idx[:-1] != idx[1:]])
    unq_ids = idx[m_idx]
    
    max_result_out.flat[unq_ids] = np.maximum.reduceat(val, m_idx)
    min_result_out.flat[unq_ids] = np.minimum.reduceat(val, m_idx)