Search code examples
pythonnumpysortingmatching

Ordering that maximizes pair-wise matches between two numpy arrays


Let's say we have two one-dimensional numpy arrays v1 and v2. The arrays are of equal length. The dtype of the arrays is '<U1' in this case. The two arrays may or may not have common items. In each array, all items are unique.

I want to write function get_maximum_match_order that:

  • Takes v1 and v2 as inputs.

  • Returns an index array that can be then used to re-order v2. The re-ordered v2 should then have maximal pair-wise matches with v1.

Example cases:

Case 1

Here the arrays match each other perfectly already, so the order will be neutral. v2 will remain the same after the order is applied.


v1 = np.array(['A', 'B', 'C'])

v2 = np.array(['A', 'B', 'C'])

order = get_maximum_match_order(v1, v2)

order -> np.array([0, 1, 2])

v2[order] -> np.array(['A', 'B', 'C']

Case 2

In this case all items are not present in both arrays. After the order has been applied to v2, items 'A' and 'B' will match.


v1 = np.array(['A', 'C', 'B'])

v2 = np.array(['B', 'A', 'E'])

order = get_maximum_match_order(v1, v2)

order -> np.array([1, 2, 0])

v2[order] -> np.array(['A', 'E', 'B'])

Case 3


v1 = np.array(['A', 'B', 'C'])

v2 = np.array(['C', 'B', 'A'])

order = get_maximum_match_order(v1, v2)

order -> np.array([2, 1, 0])

v2[order] -> np.array(['A', 'B', 'C'])

Case 4

Here the arrays don't have any common items, so the ordering will be neutral.


v1 = np.array(['A', 'B', 'C'])

v2 = np.array(['D', 'E', 'F'])

order = get_maximum_match_order(v1, v2)

order -> np.array([0, 1, 2])

v2[order] -> np.array(['D', 'E', 'F'])

Case 5


v1 = np.array(['A', 'B', 'C'])

v2 = np.array(['A', 'C', 'B'])

order = get_maximum_match_order(v1, v2)

order -> np.array([0, 2, 1])

v2[order] -> np.array(['A', 'B', 'C'])

Case 6


v1 = np.array(['A', 'G', 'B'])

v2 = np.array(['B', 'F', 'A'])

order = get_maximum_match_order(v1, v2)

order -> np.array([2, 1, 0])

v2[order] -> np.array(['A', 'F', 'B'])

Case 7


v1 = np.array(['A', 'G', 'B', 'C', 'E'])

v2 = np.array(['B', 'F', 'A', 'E', 'C'])

order = get_maximum_match_order(v1, v2)

order -> np.array([2, 1, 0, 4, 3])

v2[order] -> np.array(['A', 'F', 'B', 'C', 'E'])

I've tried experimenting with numpy's intersect1d but haven't been able to nail this down perfectly.


Solution

  • Just find pairs and align them and then distribute rest elements (indexes) any way you like.

    from contextlib import suppress
    import numpy as np
    
    def index(vs, v):
        with suppress(IndexError):
            return np.where(vs == v)[0][0]
    
    def get_maximum_match_order(v1, v2):
        ixs = [index(v1, v) for v in v2]
        it = iter(set(range(len(v1))) - set(ixs))
        return np.array([next(it) if ix is None else ix for ix in ixs])
    
    if __name__ == "__main__":
        v1 = np.array(['A', 'G', 'B', 'C', 'E'])
        v2 = np.array(['B', 'F', 'A', 'E', 'C'])
        print(get_maximum_match_order(v1, v2))