Search code examples
pythonnumpypandasspc

Reasoning about consecutive data points without using iteration


I am doing SPC analysis using numpy/pandas.

Part of this is checking data series against the Nelson rules and the Western Electric rules.

For instance (rule 2 from the Nelson rules): Check if nine (or more) points in a row are on the same side of the mean.

Now I could simply implement checking a rule like this by iterating over the array.

  • But before I do that, I'm checking here on SO if numpy/pandas has a way to do this without iteration?
  • In any case: What is the "numpy-ic" way to implement a check like the one described above?

Solution

  • As I mentioned in a comment, you may want to try using some stride tricks.

    • First, let's make an array of the size of your anomalies: we can put it as np.int8 to save some space

      anomalies = x - x.mean()
      signs = np.sign(anomalies).astype(np.int8)
      
    • Now for the strides. If you want to consider N consecutive points, you'll use

      from np.lib.stride_tricks import as_strided
      strided = as_strided(signs, 
                           strides=(signs.itemsize,signs.itemsize), 
                           shape=(signs.shape,N))
      

      That gives us a (x.size, N) rollin array: the first row is x[0:N], the second x[1:N+1]... Of course, the last N-1 rows will be meaningless, so from now on we'll use

      strided = strided[:-N+1]
      
    • Let's sum along the rows

      consecutives = strided.sum(axis=-1)
      

      That gives us an array of size (x.size-N+1) of values between -N and +N: we just have to find where the absolute values are N:

      (indices,) = np.nonzero(consecutives == N)
      

      indices is the array of the indices i of your array x for which the values x[i:i+N] are on the same side of the mean...

    Example with x=np.random.rand(10) and N=3

    >>> x = array([ 0.57016436,  0.79360943,  0.89535982,  0.83632245,  0.31046202,
                0.91398363,  0.62358298,  0.72148491,  0.99311681,  0.94852957])
    >>> signs = np.sign(x-x.mean()).astype(np.int8)
    array([-1,  1,  1,  1, -1,  1, -1, -1,  1,  1], dtype=int8)
    >>> strided = as_strided(signs,strides=(1,1),shape=(signs.size,3))
    array([[  -1,    1,    1],
           [   1,    1,    1],
           [   1,    1,   -1],
           [   1,   -1,    1],
           [  -1,    1,   -1],
           [   1,   -1,   -1],
           [  -1,   -1,    1],
           [  -1,    1,    1],
           [   1,    1, -106],
           [   1, -106,  -44]], dtype=int8)
    >>> consecutive=strided[:-N+1].sum(axis=-1)
    array([ 1,  3,  1,  1, -1, -1, -1,  1])
    >>> np.nonzero(np.abs(consecutive)==N)
    (array([1]),)