I have an array x
and the result of argsorting it i
. I need to sort x
(and y
, irrelevant here) according to i
hundreds of times. It is therefore not doable to sort the data twice, everything needs to be achieved via the initial sorting i
.
If I take x[i]
it returns the sorted x
as expected. However, I now only want to use certain rows of x
via n
. So x[n]
returns the values of x
as expected.
My problem is that I need to sort these x[n]
via i
(and will have to do the same for y[n]
.
# Example data
x = np.array([14, 15, 9, 6, 19, 18, 4, 11, 10, 0])
i = np.argsort(x)
n = np.array([2, 5, 7, 8])
#x[n] -> array([ 9, 18, 11, 10])
Desired output: index_sort(x, n, i) = array([ 9, 10, 11, 18])
Some simple (failed) attempts:
x[n][i]
-> Indexing error, as x is now too small.
x[i[n]] -> array([ 6, 11, 15, 18])
, Is sorted, but contains the wrong data
x[i][n]
-> Same
For more context: I'm creating a specific type of decision tree model. For each layer of the tree I need the above operation a different n
. Sorting becomes prohibitively expensive and even checking for set membership via np.isin
might be too slow as well already. My intuition (though perhaps wrong) says it should be possible to achieve this via indexing only, without ever having to sort or check for set membership.
For all these layers x
and i
remain the same, but a different n
is used each time.
In [263]: x = np.array([14, 15, 9, 6, 19, 18, 4, 11, 10, 0])
...: i = np.argsort(x)
...: n = np.array([2, 5, 7, 8])
i
and n
do different, and unrelated indexing operations. Both make copies (not views), which don't retain any information on the original x
:
In [264]: x[i]
Out[264]: array([ 0, 4, 6, 9, 10, 11, 14, 15, 18, 19])
In [265]: x[n]
Out[265]: array([ 9, 18, 11, 10])
Let's try working with a boolean mask:
In [266]: m = np.zeros_like(x, dtype=bool)
In [267]: m[n] = True; m
Out[267]:
array([False, False, True, False, False, True, False, True, True,
False])
It selects elements from x
same as n
(though it won't handle duplicates the same):
In [268]: x[m]
Out[268]: array([ 9, 18, 11, 10])
Now try applying the sort to m
:
In [269]: mi = m[i]; m
Out[269]:
array([False, False, True, False, False, True, False, True, True,
False])
It does select the desired elements from the sorted x[i]
:
In [270]: x[i][mi]
Out[270]: array([ 9, 10, 11, 18])
We could also convert that boolean mask back to indices:
In [272]: ni = np.nonzero(mi)[0]; ni
Out[272]: array([3, 4, 5, 8], dtype=int64)
In [273]: x[i][ni]
Out[273]: array([ 9, 10, 11, 18])