Let's say we have two one-dimensional numpy arrays v1
and v2
. The arrays are of equal length. The dtype of the arrays is '<U1' in this case. The two arrays may or may not have common items. In each array, all items are unique.
I want to write function get_maximum_match_order
that:
Takes v1
and v2
as inputs.
Returns an index array that can be then used to re-order v2
. The re-ordered v2
should then have maximal pair-wise matches with v1
.
Here the arrays match each other perfectly already, so the order will be neutral. v2
will remain the same after the order is applied.
v1 = np.array(['A', 'B', 'C'])
v2 = np.array(['A', 'B', 'C'])
order = get_maximum_match_order(v1, v2)
order -> np.array([0, 1, 2])
v2[order] -> np.array(['A', 'B', 'C']
In this case all items are not present in both arrays. After the order has been applied to v2
, items 'A' and 'B' will match.
v1 = np.array(['A', 'C', 'B'])
v2 = np.array(['B', 'A', 'E'])
order = get_maximum_match_order(v1, v2)
order -> np.array([1, 2, 0])
v2[order] -> np.array(['A', 'E', 'B'])
v1 = np.array(['A', 'B', 'C'])
v2 = np.array(['C', 'B', 'A'])
order = get_maximum_match_order(v1, v2)
order -> np.array([2, 1, 0])
v2[order] -> np.array(['A', 'B', 'C'])
Here the arrays don't have any common items, so the ordering will be neutral.
v1 = np.array(['A', 'B', 'C'])
v2 = np.array(['D', 'E', 'F'])
order = get_maximum_match_order(v1, v2)
order -> np.array([0, 1, 2])
v2[order] -> np.array(['D', 'E', 'F'])
v1 = np.array(['A', 'B', 'C'])
v2 = np.array(['A', 'C', 'B'])
order = get_maximum_match_order(v1, v2)
order -> np.array([0, 2, 1])
v2[order] -> np.array(['A', 'B', 'C'])
v1 = np.array(['A', 'G', 'B'])
v2 = np.array(['B', 'F', 'A'])
order = get_maximum_match_order(v1, v2)
order -> np.array([2, 1, 0])
v2[order] -> np.array(['A', 'F', 'B'])
v1 = np.array(['A', 'G', 'B', 'C', 'E'])
v2 = np.array(['B', 'F', 'A', 'E', 'C'])
order = get_maximum_match_order(v1, v2)
order -> np.array([2, 1, 0, 4, 3])
v2[order] -> np.array(['A', 'F', 'B', 'C', 'E'])
I've tried experimenting with numpy's intersect1d but haven't been able to nail this down perfectly.
Just find pairs and align them and then distribute rest elements (indexes) any way you like.
from contextlib import suppress
import numpy as np
def index(vs, v):
with suppress(IndexError):
return np.where(vs == v)[0][0]
def get_maximum_match_order(v1, v2):
ixs = [index(v1, v) for v in v2]
it = iter(set(range(len(v1))) - set(ixs))
return np.array([next(it) if ix is None else ix for ix in ixs])
if __name__ == "__main__":
v1 = np.array(['A', 'G', 'B', 'C', 'E'])
v2 = np.array(['B', 'F', 'A', 'E', 'C'])
print(get_maximum_match_order(v1, v2))