Search code examples
pythonsignal-processingperiodic-task

Finding periodicity in an algorithmic signal


In testing a conjecture about the following recursive relation

enter image description here ,

which claims a periodicity of some kind for the sequence of numbers, I wrote a python program which computes the sequences and prints them in a table.

 1   # Consider the recursive relation x_{i+1} = p-1 - (p*i-1 mod x_i)
 2   # with p prime and x_0 = 1. What is the shortest period of the
 3   # sequence?
 4   
 5   from __future__ import print_function
 6   import numpy as np
 7   from matplotlib import pyplot  as plt
 8   
 9   # The length of the sequences.
 10  seq_length = 100
 11  
 12  upperbound_primes = 30
 13  
 14  # Computing a list of prime numbers up to n
 15  def primes(n):
 16   sieve = [True] * n
 17   for i in xrange(3,int(n**0.5)+1,2):
 18     if sieve[i]:
 19         sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
 20   return [2] + [i for i in xrange(3,n,2) if sieve[i]]
 21  
 22  # The list of prime numbers up to upperbound_primes
 23  p = primes(upperbound_primes)
 24  
 25  # The amount of primes numbers
 26  no_primes = len(p)
 27  
 28  # Generate the sequence for the prime number p
 29  def sequence(p):
 30    x = np.empty(seq_length)
 31    x[0] = 1
 32    for i in range(1,seq_length):
 33      x[i] = p - 1 - (p * (i-1) - 1) % x[i-1]
 34    return x
 35  
 36  # List with the sequences.
 37  seq = [sequence(i) for i in p]  
 38  """
 39  # Print the sequences in a table where the upper row
 40  # indicates the prime numbers.
 41  for i in range(seq_length):
 42    if not i: 
 43      for n in p:
 44        print('\t',n,end='')
 45      print('')
 46    print(i+1,'\t',end='')
 47    for j in range(no_primes):
 48      print(seq[j][i],end='\t')
 49    print('\n',end='')
 50  """
 51  def autocor(x):
 52    result = np.correlate(x,x,mode='full')
 53    return result[result.size/2:]
 54  
 55  
 56  fig = plt.figure('Finding period in the sequences')
 57  k = 0
 58  for s in  seq:
 59    k = k + 1
 60    fig.add_subplot(no_primes,1,k)
 61    plt.title("Prime number %d" % p[k-1])
 62    plt.plot(autocor(s))
 63  plt.show()
 64  

Now I want to investigate periodicities in these sequences that I computed. After looking around on the net I found myself two options it seems:

  • Preform autocorrelation on the data and look for the first peak. This should give an approximation of the period.
  • Preform a FFT on the data. This shows the frequency of the numbers. I do not see how this can give any useful information about the periodicity of a sequence of numbers.

The last lines show my attempt of using autocorrelation, inspired by the accepted answer of How can I use numpy.correlate to do autocorrelation?.

It gives the following plot

enter image description here Clearly we see a descending sequence of numbers for all the primes.

When testing the same method on a sin function with the following simplyfied python-code snippet

 1   # Testing the autocorrelation of numpy
 2   
 3   import numpy as np
 4   from matplotlib import pyplot as plt
 5   
 6   num_samples = 1000
 7   t = np.arange(num_samples)
 8   dt = 0.1
 9   
 10  def autocor(x):
 11    result = np.correlate(x,x,mode='full')
 12    return result[result.size/2:]
 13  
 14  def f(x):
 15    return [np.sin(i * 2 * np.pi * dt) for i in range(num_samples)]
 16  
 17  plt.plot(autocor(f(t)))
 18  plt.show()

I get a similar result, it giving the following plot for the sine function

enter image description here

How could I read off the periodicity in the sine-function case, for example?

Anyhow, I do not understand the mechanism of the autocorrelation leading to peaks that give information of the periodicity of a signal. Can someone elaborate on that? How do you properly use autocorrelation in this context?

Also what am I doing wrong in my implementation of the autocorrelation?

Suggestions on alternative methods of determining periodicity in a sequence of numbers are welcome.


Solution

  • There are quite a few questions here, so I'll start be describing how an autocorrelation produces the period from the case of "3", ie, your second sub-plot of the first image.

    For prime number 3, the sequence is (after a less consistent start) 1,2,1,2,1,2,1,2,.... To calculate the autocorrelation, the array is basically translated relative to itself, all the elements that align are multiplied, and all of these results are added. So it looks something like this, for a few test cases, where A is the autocorrelation:

     0  1  2  3  4  5  6  7 ... 43 44 45 46 47 48 49         # indices 0    
     1  2  1  2  1  2  1  2      2  1  2  1  2  1  2         # values  0
     1  2  1  2  1  2  1  2      2  1  2  1  2  1  2         # values  1
     0  1  2  3  4  5  6  7 ... 43 44 45 46 47 48 49         # indices 1
     1  4  1  4  1  4  1  4      4  1  4  1  4  1  4         # products
     # above result A[0] = 5*25  5=1+4   25=# of pairs       # A[0] = 125
    
    
     0  1  2  3  4  5  6  7 ... 43 44 45 46 47 48 49         # indices 0    
     1  2  1  2  1  2  1  2      2  1  2  1  2  1  2         # values  0
        1  2  1  2  1  2  1  2      2  1  2  1  2  1  2         # values  1
        0  1  2  3  4  5  6  7 ... 43 44 45 46 47 48 49         # indices 1
        2  2  2  2  2  2  2      2  2  2  2  2  2  2         # products
     # above result A[1] = 4*24  4=2+2   24=# of pairs       # A[1] = 96
    
     0  1  2  3  4  5  6  7 ... 43 44 45 46 47 48 49         # indices 0    
     1  2  1  2  1  2  1  2      2  1  2  1  2  1  2         # values  0
           1  2  1  2  1  2  1  2      2  1  2  1  2  1  2         # values  1
           0  1  2  3  4  5  6  7 ... 43 44 45 46 47 48 49         # indices 1
           1  4  1  4  1  4  1  4      4  1  4  1  4         # products
     # above result A[2] = 5*23  5=4+1   23=# of pairs       # A[2] = 115
    

    There are three take-home messages from the above: 1. the autocorrelation, A, has larger value when like elements are lined up and multiplied, here at every other step. 2. The index of the autocorrelation corresponds to the relative shift. 3. When doing the autocorrelation over the full arrays, as shown here, there's always a downward ramp since the number of points added together to produce the value are reduced at each successive shift.

    So here you can see why there's a periodic 20% bump in your graph from "Prime number 3": because the terms that are summed are 1+4 when they are aligned, vs 2+2 when they aren't, ie, 5 vs 4. It's this bump that you're looking for in reading off the period. That is, here is shows that the period is 2, since that is the index of your first bump. (Also, note btw, in the above I only do the calculation as number of pairs to see how this known periodicity leads to the result you see in the autocorrelation, that is, one doesn't in general want to think of number of pairs.)

    In these calculations, the values of the bump relative to the base will be increased if you first subtract the mean before doing the autocorrelation. The ramp can be removed if you do the calculation using arrays with trimmed ends, so there's always the same overlap; this usually makes sense since usually one is looking for a periodicity of much shorter wavelength than the full sample (because it takes many oscillations to define a good period of oscillation).


    For the autocorrelation of the sine wave, the basic answer is that the period is shown as the first bump. I redid the plot except with the time axis applied. It's always clearest in these things to use a real time axis, so I changed your code a bit to include that. (Also, I replaced the list comprehension with a proper vectorized numpy expression for calculating the sin wave, but that's not important here. And I also explicitly defined the frequency in f(x), just to make it more clear what's going on -- as an implicitly frequency of 1 in confusing.)

    The main point is that since the autocorrelation is calculated by shifting along the axis one point at a time, the axis of the autocorrelation is just the time axis. So I plot that as the axis, and then can read the period off of that. Here I zoomed in to see it clearly (and the code is below):

    enter image description here

    # Testing the autocorrelation of numpy
    
    import numpy as np
    from matplotlib import pyplot as plt
    
    num_samples = 1000
    dt = 0.1    
    t = dt*np.arange(num_samples)   
    
    def autocor(x):
      result = np.correlate(x,x,mode='full')
      return result[result.size/2:]
    
    def f(freq):
      return np.sin(2*np.pi*freq*t)    
    
    plt.plot(t, autocor(f(.3)))
    plt.xlabel("time (sec)")
    plt.show()                                              
    

    That is, in the above, I set the frequency to 0.3, and the graph shows the period as about 3.3, which is what's expected.


    All of this said, in my experience, the autocorrelation generally works well for physical signals but not so reliably for algorithmic signals. It's fairly easy to throw off, for example, if a periodic signal skips a step, which can happen with an algorithm, but is less likely with a vibrating object. You'd think that it should be trivial to calculate that period of an algorithmic signal, but a bit of searching around will show that it's not, and it's even difficult to define what's meant by period. For example, the series:

    1 2 1 2 1 2 0 1 2 1 2 1 2
    

    won't work well with the autocorrelation test.