I've found similar questions posted here but none which apply to row-defined time series data. I'm anticipating the solution might be found via numpy or scipi. Because I have so much data, I'd prefer not to use pandas dataframes.
I have many runs of 19-channel EEG data stored in 2d numpy arrays. I've gone through and marked noisy data as nan, so a given run might look something like:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19
nan 7 5 4 nan nan 7 9 0 -3 nan 2 nan nan 5 7 6 nan 8
0 6 7 3 5 9 2 2 4 6 8 7 5 6 4 -1 nan -8 -9
6 8 7 7 0 3 2 4 5 1 3 7 3 8 4 6 9 0 0
...
nan nan nan 3 5 -1 0 nan nan nan 1 2 0 -1 -2 nan nan nan nan
(without channel labels)
Each run is between 80,000 and 120,000 rows (cycles) long.
For each of these runs, I want to create a new stack of contiguous non-overlapping epochs where no values were artifacted to nan. Something like:
def generate_contigs(run, length):
contigs = np.ndarray(three-dimensional array of arbitrary depth x 19 x length)
count = 0
for row in run:
if nan not in row:
count+=1
if count==length:
stack array of last (length) rows on contigs ndarray
count = 0
else:
count = 0
return(contigs)
Say, for example, that I specified length 4 (arbitrarily small), and that my function found 9 non-overlapping contigs where no value for 4 straight rows was nan.
My output should look something like:
contigs = [
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array]
]
Where each element in the output stack resembles the following:
[4 6 5 8 3 5 4 1 8 8 7 5 6 4 3 5 6 6 5]
[5 5 7 2 2 9 8 7 7 8 3 0 7 4 4 6 3 7 3]
[4 4 6 7 9 0 9 9 8 8 7 7 6 6 5 5 4 4 3]
[1 2 3 4 5 4 3 6 5 4 3 7 6 5 8 7 6 9 8]
Where the 4 rows contained in that element were found continuously in the original run's data array.
I feel like I'm pretty close here, but I'm struggling with the row operations and minimizing iteration. Bonus points if you can find a way to attach the start/stop row indices as a tuple for later analysis.
You could use numpy indexing options to roll over the array and see if any selection with the proper size length x 19 contains any nan
value using numpy isnan and numpy any.
If there is no nan
value, add the selection to the contigs
list and move after, if there is a nan
instead move the index by 1 and check if the new selection is free of nan
.
On the way is easy to store the indexes of the first row of the stacked selection.
def generate_contigs(run, length):
i = 0
contigs = []
startindexes = []
while i < run.shape[0]-length:
stk = run[i:(i+length),:]
if not np.any(np.isnan(stk)):
contigs.append(stk)
startindexes.append(i)
i += length
else:
i += 1
return contigs, startindexes