python math signal-processing eigenvector autocorrelation

How to find the Autocoorelation Matrix of a 1D array/vector

I've 1D array of size n which represents a signal in the time domain, I need to find the Autocorrelation Matrix of this signal using python, then I'll compute the eigenvectors and eigenvalues of this matrix.

what I've tried is to use the Toeplitz method from scipy.linalg as follows

res = scipy.linalg.toeplitz(c=np.asarray(signal),r=np.asarray(signal))
eigenValues,eigenVectors = numpy.linalg.eig(res)

I'm not sure if that's correct because on Matlab forums I saw a quite different solution Matlab solution

Solution

Terminology about correlations is confusing, so let me take care in defining what it sounds like you want to compute.

Autocorrelation matrix of a random signal

"Autocorrelation matrix" is usually understood as a characterization of random vectors: for a random vector (X[1], ..., X[N]) where each element is a real-valued random variable, the autocorrelation matrix is an NxN symmetric matrix R_XX whose (i,j)th element is

R_XX[i,j] = E[X[i] ⋅ X[j]]

and E[⋅] denotes expectation.

To reasonably estimate an autocorrelation matrix, you need multiple observations of random vector X to estimate the expectations. But it sounds like you have only one 1D array x. If we nevertheless apply the above formula, expectations simplify away to

R_XX[i,j] = E[X[i] ⋅ X[j]] ~= x[i] ⋅ x[j].

In other words, the matrix degenerates to the outer product np.outer(x, x), a rank-1 matrix with one nonzero eigenvalue. But this is an awful estimate of R_XX and doesn't reveal new insight about the signal.

Autocorrelation for a WSS signal

In signal processing, a common modeling assumption is that a signal is "wide-sense-stationary or WSS", meaning that any time shift of the signal has the same statistics. This assumption is particularly such that the expectations above can be estimated from a single observation of the signal:

R_XX[i,j] = E[X[i] ⋅ X[j]] ~= sum_n (x[i + n] ⋅ x[j + n])

where the sum over n is over all samples. For simplicity, imagine in this description that x is a signal that goes on infinitely. In practice on a finite-length signal, something has to be done at the signal edges, but I'll gloss over this. Equivalently by the change of variable m = i + n we have

R_XX[i,j] = E[X[i] ⋅ X[j]] ~= sum_m (x[m] ⋅ x[j - i + m]),

with i and j only appearing together as a difference (j - i) in the right-hand side. So this autocorrelation is usually indexed in terms of the "lag" k = j - i,

R_xx[k] = sum_m (x[m] ⋅ x[j - i + m]).

Note that this results in a 1D array rather than a matrix. You can compute it for instance with scipy.signal.correlate(x, x) in Python or xcorr(x, x) in Matlab. Again, I'm glossing over boundary handling considerations at the signal edges. Please follow these links to read about the options that these implementations provide.

You can relate the 1D correlation array R_xx[k] with the matrix R_XX[i,j] by

R_XX[i,j] ~= R_xx[j - i]

which like you said is Toeplitz.