Search code examples
pythonnumpysplitseq

Numpy: split array into parts according to sequence of values


What i have is a big numpy one-dimensional np.int16 array with data and one boolean array, which stores information whether a particular sample (wich is samplesize long) of data fits some criteria (is valid) or don't fits (is not valid). I mean i have something like this:

samplesize = 5
data = array([1, 2, 3, 4, 5, 3, 2, 1, 3, 2, 4, 5, 2, 1, 1], dtype=int16) 
membership = array([False, True, False], dtype=bool)

Here membership[0] identifies whether data[ 0*samplesize : 1*samplesize ] is valid.

What i want is to split data array into chunks according to sequence of True values in membership array. For example, if membership contains three or more successive True statement then the decision is made, that it is meaningful sample of data.

Example

True, True, True , True - valid sequence 
True, True, False, True , True - invalid sequece

Assuming we have identified start of i-th valid sequence as start[i] and end of such a sequence as end[i], i want to split an data array into pieces which start from start[i] * samplesize and last to end[i] * samplesize.

How could i accomplish this ?


Solution

  • I don't understand your question. Do you want to get start & end index of membership with 3 or more successive True?

    Here is the code to do that, the basic idea is to diff(membership), and get the index of rising edge and falling edge:

    import numpy as np
    membership = np.random.randint(0, 2, 100)
    d = np.diff(np.r_[0, membership, 0])
    start = np.where(d == 1)[0]
    end = np.where(d == -1)[0]
    mask = (end - start) >= 3
    start = start[mask]
    end = end[mask]
    
    for s, e in zip(start, end):
        print s, e, membership[s:e]