python time-series fft distribution curve-fitting

Time Series - Approach for Identifying Square Wave-type Signals

I am trying to a way to find an approach to filtering out signals that have a pattern like the one below.

The pattern can be described having square waves, often having a constant fluctuating value +-1, +-2, or +-0 over numerous time periods. The signal will often drop within 5-100 std dev instantaneously and then remain at a constant rate for a very short period of time, then shoot back up again. These types of signals can have single or multiple varying lengths of square waves, but always exhibit a square wave in the signal.

Data for this signal:

y = array([  8.,   8., 173., 173., 172., 172., 172., 172., 172., 172., 172., 172., 172., 173., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 173., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 173., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 173., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 130., 130., 130., 130., 130., 130., 130., 130.,130., 130., 130., 130., 130., 130., 130., 130., 130., 130., 130.,130., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 173., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 173., 172., 172., 172.,172., 172., 172., 172., 172., 131., 131., 131., 131., 131., 131.,131., 131., 131., 131., 131., 131., 131., 131., 131., 131., 131.,172., 172., 172., 172., 173., 172., 172., 172., 172., 172., 173.,172., 172., 172., 172., 172., 172., 172., 173., 172., 172., 172.,172., 173., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 173., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 173.,172., 172., 172., 172., 172., 172., 172., 172., 173., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 173., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172.])

I need to find an approach that can help me cluster out or filter out these signals of about 3000 signals. I've tried the following and have had very mixed results:

Univariate Time Series Clustering w/ TSLearn and DTW python packages on a number variance-related features (mixed results)
Multivariate Clustering with K-Means, KNN, etc (can often assign multiple clusters for an individual signal. Rule is one bucket for one signal, not multiple buckets)
Conditional logic that finds subsequences in arrays, hoping to find the square waves (I can't do anything with this because half the length of a good signal can be equal to half the length of the important part of the signal; the square wave)
Kernel Distribution Estimation (I have other signals that have the same distribution as this signal, so I cannot filter these out based on ranking/clustering of coefficients)

Can you recommend other approaches that would help me identify this type of signal from a group of other signals? If your approach is on Fourier Transformations, can you provide an example of how I might use it to filter out this signal from a group of other signals?

Solution

This will do it:

def first_der(df):
  y = df.NREVS.values
  x = df.cum_int.values

  dy=np.diff(y,1)
  dx=np.diff(x,1)
  yfirst=dy/dx
  return yfirst

def zero_runs(yfirst):
    # Create an array that is 1 where a is 0, and pad each end with an extra 0.
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
    absdiff = np.abs(np.diff(iszero))
    # Runs start and end where absdiff is 1.
    ranges = np.where(absdiff == 1)[0].reshape(-1,2)
    return yind
  
def square_finder(yfirst, yind, df):

  xmax = yind.shape[0]  #max value in first position where y_first can be indexed
  ymax = yind.shape[1] #max value in second position

  thresh = 4
  for i in range(0,xmax):
    if yind[i][1] < len(yfirst):
      if ((yfirst[yind[i][1]] > 5) | (yfirst[yind[i][1]] < -5)):
        #if ((yfirst[yind[i-1][1]+1] > 3) | (yfirst[yind[i-1][1]+1] < -3)):
        zeros = yind[i][1] - yind[i-1][1] - 2
        if zeros >= thresh:
          df['category'] = 'square'
        else:
          pass
      else:
        pass
    else:
      pass
  return df