Intervaled argmax for a NumPy array with intervals defined by another array

I have a 1D array of sorted non-unique numbers. The number of times they repeat is random. It is associated with an array of weights with the same size. For a given series of identical elements, the associated series of weights may or may not have repeated elements as well and in this whole array of weights, there may or may not be repeated elements. E.g:

pos     = np.array([3, 3, 7, 7, 9, 9, 9, 10, 10])
weights = np.array([2, 10, 20, 8, 5, 7, 15, 7, 2])

I need to extract an array of unique elements of pos, but where the unique element is the one with the greatest weight.

The working solution I came up with involves looping:

pos     = np.array([3, 3, 7, 7, 9, 9, 9, 10, 10])
weights = np.array([2, 10, 20, 8, 5, 7, 15, 7, 2])
# Get the number of occurences of the elements in pos but throw away the unique array, it's not the one I want.
_, ucounts = np.unique(pos, return_counts=True)
# Initialize the output array.
unique_pos_idx = np.zeros([ucounts.size], dtype=np.uint32)

last = 0
for i in range(ucounts.size):
    maxpos = np.argmax( weights[last:last+ucounts[i]] )
    unique_pos_idx[i] = last + maxpos
    last += ucounts[i]

# Result is:
# unique_pos_idx = [1 2 6 7]

but I’m not using much of the Python language or Numpy (apart from the use of numpy arrays) so I wonder if there is a more Pythonesque and/or more efficient solution than even a Cython version of the above?

Thanks

Solution

Here's one vectorized approach -

sidx = np.lexsort([weights,pos])
out = sidx[np.r_[np.flatnonzero(pos[1:] != pos[:-1]), -1]]

Possible improvement(s) on performance -

1] A faster way to get the sorted indices sidx with scaling -

sidx = (pos*(weights.max()+1) + weights).argsort()

2] The indexing at the end could be made faster with boolean-indexing, specially when dealing with many such intervals/groupings -

out = sidx[np.concatenate((pos[1:] != pos[:-1], [True]))]

Runtime test

All approaches :

def org_app(pos, weights):
    _, ucounts = np.unique(pos, return_counts=True)
    unique_pos_idx = np.zeros([ucounts.size], dtype=np.uint32)    
    last = 0
    for i in range(ucounts.size):
        maxpos = np.argmax( weights[last:last+ucounts[i]] )
        unique_pos_idx[i] = last + maxpos
        last += ucounts[i]
    return unique_pos_idx

def vec_app(pos, weights):
    sidx = np.lexsort([weights,pos])
    return sidx[np.r_[np.flatnonzero(pos[1:] != pos[:-1]), -1]]

def vec_app_v2(pos, weights):
    sidx = (pos*(weights.max()+1) + weights).argsort()
    return sidx[np.concatenate((pos[1:] != pos[:-1], [True]))]

Timings and verification -

For the setup, let's use the sample and tile it 10000 times with scaling, as we intend to create 1000 times more number of intervals. Also, let's use unique numbers in weights, so that the argmax indices aren't confused by identical numbers :

In [155]: # Setup input
     ...: pos = np.array([3, 3, 7, 7, 9, 9, 9, 10, 10,])
     ...: pos = (pos + 10*np.arange(10000)[:,None]).ravel()
     ...: weights = np.random.choice(10*len(pos), size=len(pos), replace=0)
     ...: 
     ...: print np.allclose(org_app(pos, weights), vec_app(pos, weights))
     ...: print np.allclose(org_app(pos, weights), vec_app_v2(pos, weights))
     ...: 
True
True

In [156]: %timeit org_app(pos, weights)
     ...: %timeit vec_app(pos, weights)
     ...: %timeit vec_app_v2(pos, weights)
     ...: 
10 loops, best of 3: 56.4 ms per loop
100 loops, best of 3: 14.8 ms per loop
1000 loops, best of 3: 1.77 ms per loop

In [157]: 56.4/1.77 # Speedup with vectorized one over loopy
Out[157]: 31.864406779661017