What i have is a big numpy one-dimensional np.int16 array with data and one boolean array, which stores information whether a particular sample (wich is samplesize
long) of data fits some criteria (is valid) or don't fits (is not valid).
I mean i have something like this:
samplesize = 5
data = array([1, 2, 3, 4, 5, 3, 2, 1, 3, 2, 4, 5, 2, 1, 1], dtype=int16)
membership = array([False, True, False], dtype=bool)
Here membership[0]
identifies whether data[ 0*samplesize : 1*samplesize ]
is valid.
What i want is to split data array into chunks according to sequence of True
values in membership array. For example, if membership
contains three or more successive True
statement then the decision is made, that it is meaningful sample of data
.
Example
True, True, True , True - valid sequence
True, True, False, True , True - invalid sequece
Assuming we have identified start of i
-th valid sequence as start[i]
and end of such a sequence as end[i]
, i want to split an data
array into pieces which start from start[i] * samplesize
and last to end[i] * samplesize
.
How could i accomplish this ?
I don't understand your question. Do you want to get start & end index of membership
with 3 or more successive True?
Here is the code to do that, the basic idea is to diff(membership)
, and get the index of rising edge and falling edge:
import numpy as np
membership = np.random.randint(0, 2, 100)
d = np.diff(np.r_[0, membership, 0])
start = np.where(d == 1)[0]
end = np.where(d == -1)[0]
mask = (end - start) >= 3
start = start[mask]
end = end[mask]
for s, e in zip(start, end):
print s, e, membership[s:e]