Ordering that maximizes pair-wise matches between two numpy arrays

Let's say we have two one-dimensional numpy arrays v1 and v2. The arrays are of equal length. The dtype of the arrays is '<U1' in this case. The two arrays may or may not have common items. In each array, all items are unique.

I want to write function get_maximum_match_order that:

Takes v1 and v2 as inputs.
Returns an index array that can be then used to re-order v2. The re-ordered v2 should then have maximal pair-wise matches with v1.

Example cases:

Case 1

Here the arrays match each other perfectly already, so the order will be neutral. v2 will remain the same after the order is applied.


v1 = np.array(['A', 'B', 'C'])

v2 = np.array(['A', 'B', 'C'])

order = get_maximum_match_order(v1, v2)

order -> np.array([0, 1, 2])

v2[order] -> np.array(['A', 'B', 'C']

Case 2

In this case all items are not present in both arrays. After the order has been applied to v2, items 'A' and 'B' will match.


v1 = np.array(['A', 'C', 'B'])

v2 = np.array(['B', 'A', 'E'])

order = get_maximum_match_order(v1, v2)

order -> np.array([1, 2, 0])

v2[order] -> np.array(['A', 'E', 'B'])

Case 3


v1 = np.array(['A', 'B', 'C'])

v2 = np.array(['C', 'B', 'A'])

order = get_maximum_match_order(v1, v2)

order -> np.array([2, 1, 0])

v2[order] -> np.array(['A', 'B', 'C'])

Case 4

Here the arrays don't have any common items, so the ordering will be neutral.


v1 = np.array(['A', 'B', 'C'])

v2 = np.array(['D', 'E', 'F'])

order = get_maximum_match_order(v1, v2)

order -> np.array([0, 1, 2])

v2[order] -> np.array(['D', 'E', 'F'])

Case 5


v1 = np.array(['A', 'B', 'C'])

v2 = np.array(['A', 'C', 'B'])

order = get_maximum_match_order(v1, v2)

order -> np.array([0, 2, 1])

v2[order] -> np.array(['A', 'B', 'C'])

Case 6


v1 = np.array(['A', 'G', 'B'])

v2 = np.array(['B', 'F', 'A'])

order = get_maximum_match_order(v1, v2)

order -> np.array([2, 1, 0])

v2[order] -> np.array(['A', 'F', 'B'])

Case 7


v1 = np.array(['A', 'G', 'B', 'C', 'E'])

v2 = np.array(['B', 'F', 'A', 'E', 'C'])

order = get_maximum_match_order(v1, v2)

order -> np.array([2, 1, 0, 4, 3])

v2[order] -> np.array(['A', 'F', 'B', 'C', 'E'])

I've tried experimenting with numpy's intersect1d but haven't been able to nail this down perfectly.

Solution

Just find pairs and align them and then distribute rest elements (indexes) any way you like.

from contextlib import suppress
import numpy as np

def index(vs, v):
    with suppress(IndexError):
        return np.where(vs == v)[0][0]

def get_maximum_match_order(v1, v2):
    ixs = [index(v1, v) for v in v2]
    it = iter(set(range(len(v1))) - set(ixs))
    return np.array([next(it) if ix is None else ix for ix in ixs])

if __name__ == "__main__":
    v1 = np.array(['A', 'G', 'B', 'C', 'E'])
    v2 = np.array(['B', 'F', 'A', 'E', 'C'])
    print(get_maximum_match_order(v1, v2))