Search code examples
pythonarraysnumpyvectorizationargmax

cumulative argmax of a numpy array


Consider the array a

np.random.seed([3,1415])
a = np.random.randint(0, 10, (10, 2))
a

array([[0, 2],
       [7, 3],
       [8, 7],
       [0, 6],
       [8, 6],
       [0, 2],
       [0, 4],
       [9, 7],
       [3, 2],
       [4, 3]])

What is a vectorized way to get the cumulative argmax?

array([[0, 0],  <-- both start off as max position
       [1, 1],  <-- 7 > 0 so 1st col = 1, 3 > 2 2nd col = 1
       [2, 2],  <-- 8 > 7 1st col = 2, 7 > 3 2nd col = 2
       [2, 2],  <-- 0 < 8 1st col stays the same, 6 < 7 2nd col stays the same
       [2, 2],  
       [2, 2],
       [2, 2],
       [7, 2],  <-- 9 is new max of 2nd col, argmax is now 7
       [7, 2],
       [7, 2]])

Here is a non-vectorized way to do it.

Notice that as the window expands, argmax applies to the growing window.

pd.DataFrame(a).expanding().apply(np.argmax).astype(int).values

array([[0, 0],
       [1, 1],
       [2, 2],
       [2, 2],
       [2, 2],
       [2, 2],
       [2, 2],
       [7, 2],
       [7, 2],
       [7, 2]])

Solution

  • Here's a vectorized pure NumPy solution that performs pretty snappily:

    def cumargmax(a):
        m = np.maximum.accumulate(a)
        x = np.repeat(np.arange(a.shape[0])[:, None], a.shape[1], axis=1)
        x[1:] *= m[:-1] < m[1:]
        np.maximum.accumulate(x, axis=0, out=x)
        return x
    

    Then we have:

    >>> cumargmax(a)
    array([[0, 0],
           [1, 1],
           [2, 2],
           [2, 2],
           [2, 2],
           [2, 2],
           [2, 2],
           [7, 2],
           [7, 2],
           [7, 2]])
    

    Some quick testing on arrays with thousands to millions of values suggests that this is anywhere between 10-50 times faster than looping at the Python level (either implicitly or explicitly).