I have an array (A) of sorted integers which contains ascending sequences with gaps.
A = array([1,2,3,4, 7,8,9, 23,24,25, 100])
I have an array (B) that contains a few values selected from A through an external process.
B = array([1,2,23,25,100])
I want to filter out values in B that belong to the same sequence in A so it returns only the first values of each unique sequence
C = array([1,23,100])
I have managed to do it by creating a second list to keep track of what has already been appended, but it seems kind of clumsy. I'm wondering if there is a better way to do this?
import numpy as np
A = np.array([1,2,3,4, 7,8,9, 23,24,25, 100])
B = np.array([1,2,23,25,100])
C = []
already_used_sequence = []
for x in enumerate(A):
if x[0]-x[1] in already_used_sequence : #did we already group this sequence?
pass
elif len(np.intersect1d(B, x[1])) is not None: #is this value in B?
for h in B:
if h == x[1]:
C.append(x[1])
already_used_sequence.append(x[0]-x[1])
C=np.array(C)
You need a groupby operation, which is not easily done with numpy. One option would be to use pandas:
import pandas as pd
# convert to pandas Series
s = pd.Series(A)
# group by successive values
# keep the first found value per group
out = s[np.isin(A, B)].groupby(s.diff().gt(1).cumsum()).first().to_numpy()
Output: array([ 1, 23, 100])