Search code examples
pythonarraysnumpyindexingpython-itertools

Differences in an array based on groups defined by another array


I have two arrays of the same size. One, call it A, contains a series of repeated numbers; the other, B contains random numbers.

import numpy as np

A = np.array([1,1,1,2,2,2,0,0,0,3,3])
B = np.array([1,2,3,6,5,4,7,8,9,10,11])

I need to find the differences in B between the two extremes defined by the groups in A. More specifically, I need an output C such as

C = [2, -2, 2, 1]

where each term is the difference 3 - 1, 4 - 6, 9 - 7, and 11 - 10, i.e., the difference between the extremes in B identified by the groups of repeated numbers in A.

I tried to play around with itertools.groupby to isolate the groups in the first array, but it is not clear to me how to exploit the indexing to operate the differences in the second.


Solution

  • Edit: C is now sorted the same way as in the question

    C = []
    _, idx = np.unique(A, return_index=True)
    for i in A[np.sort(idx)]:
        bs = B[A==i]
        C.append(bs[-1] - bs[0])
    
    print(C) // [2, -2, 2, 1]
    

    np.unique returns, for each unique value in A, the index of the first appearance of it.

    i in A[np.sort(idx)] iterates over the unique values in the order of the indexes.

    B[A==i] extracts the values from B at the same indexes as those values in A.