Search code examples
pythontime-seriesfftdistributioncurve-fitting

Time Series - Approach for Identifying Square Wave-type Signals


I am trying to a way to find an approach to filtering out signals that have a pattern like the one below.

The pattern can be described having square waves, often having a constant fluctuating value +-1, +-2, or +-0 over numerous time periods. The signal will often drop within 5-100 std dev instantaneously and then remain at a constant rate for a very short period of time, then shoot back up again. These types of signals can have single or multiple varying lengths of square waves, but always exhibit a square wave in the signal.

enter image description here

Data for this signal:

y = array([  8.,   8., 173., 173., 172., 172., 172., 172., 172., 172., 172., 172., 172., 173., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 173., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 173., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 173., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 130., 130., 130., 130., 130., 130., 130., 130.,130., 130., 130., 130., 130., 130., 130., 130., 130., 130., 130.,130., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 173., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 173., 172., 172., 172.,172., 172., 172., 172., 172., 131., 131., 131., 131., 131., 131.,131., 131., 131., 131., 131., 131., 131., 131., 131., 131., 131.,172., 172., 172., 172., 173., 172., 172., 172., 172., 172., 173.,172., 172., 172., 172., 172., 172., 172., 173., 172., 172., 172.,172., 173., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 173., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 173.,172., 172., 172., 172., 172., 172., 172., 172., 173., 172., 172.,172., 172., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 173., 172., 172., 172., 172., 172., 172., 172., 172., 172.,172., 172., 172., 172., 172.])

I need to find an approach that can help me cluster out or filter out these signals of about 3000 signals. I've tried the following and have had very mixed results:

  • Univariate Time Series Clustering w/ TSLearn and DTW python packages on a number variance-related features (mixed results)
  • Multivariate Clustering with K-Means, KNN, etc (can often assign multiple clusters for an individual signal. Rule is one bucket for one signal, not multiple buckets)
  • Conditional logic that finds subsequences in arrays, hoping to find the square waves (I can't do anything with this because half the length of a good signal can be equal to half the length of the important part of the signal; the square wave)
  • Kernel Distribution Estimation (I have other signals that have the same distribution as this signal, so I cannot filter these out based on ranking/clustering of coefficients)

Can you recommend other approaches that would help me identify this type of signal from a group of other signals? If your approach is on Fourier Transformations, can you provide an example of how I might use it to filter out this signal from a group of other signals?


Solution

  • This will do it:

    def first_der(df):
      y = df.NREVS.values
      x = df.cum_int.values
    
      dy=np.diff(y,1)
      dx=np.diff(x,1)
      yfirst=dy/dx
      return yfirst
    
    def zero_runs(yfirst):
        # Create an array that is 1 where a is 0, and pad each end with an extra 0.
        iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
        absdiff = np.abs(np.diff(iszero))
        # Runs start and end where absdiff is 1.
        ranges = np.where(absdiff == 1)[0].reshape(-1,2)
        return yind
      
    def square_finder(yfirst, yind, df):
    
      xmax = yind.shape[0]  #max value in first position where y_first can be indexed
      ymax = yind.shape[1] #max value in second position
    
      thresh = 4
      for i in range(0,xmax):
        if yind[i][1] < len(yfirst):
          if ((yfirst[yind[i][1]] > 5) | (yfirst[yind[i][1]] < -5)):
            #if ((yfirst[yind[i-1][1]+1] > 3) | (yfirst[yind[i-1][1]+1] < -3)):
            zeros = yind[i][1] - yind[i-1][1] - 2
            if zeros >= thresh:
              df['category'] = 'square'
            else:
              pass
          else:
            pass
        else:
          pass
      return df