Rolling window PCA in python

I'm wondering if anyone knows how to implement a rolling/moving window PCA that reuses the calculated PCA when adding and removing measurements.

The idea is that I have a large set of data (measurement) over a very long time, and I would like to have a moving window (say, 200 days) starting at the beginning of my dataset and each step, I include the next day's measurement and throw out the last measurement, so my window is always 200 days long. However, I would not like to simply recalculate the PCA each time.

Is it possible to make an algorithm that is more efficient than simply calculating the PCA for each window independently? Thanks in advance!

Solution

A complete answer depends on a lot of factors. I'll cover what I think are the most important such factors, and hopefully that'll be enough information to point you in the right direction.

First, directly answering your question, yes it is possible to make an algorithm that is more efficient than simply calculating the PCA for each window independently.

Improving the Naive PCA Algorithm (low-dimensional inputs)

As a first pass at the problem, let's assume that you're doing a naive PCA calculation with no normalization (i.e., you're leaving the data lone, computing the covariance matrix, and finding that matrix's eigenvalues/eigenvectors).

When faced with an input matrix X whose PCA we want to compute, the naive algorithm first computes the covariance matrix W = X.T @ X. Once we've computed that for some window of 200 elements, we can cheaply add or remove elements from consideration from the original data set by removing their contribution to the covariance.

"""
W:   shape (p, p)
row: shape (1, p)
"""

def add_row(W, row):
  return W + (row.T @ row)

def remove_row(W, row):
  return W - (row.T @ row)

Your description of a sliding window is equivalent to removing a row and adding a new one, so we can quickly compute a new covariance matrix using O(p^2) computations rather than the O(n p^2) a typical matrix multiply would take (with n==200 for this problem).

The covariance matrix isn't the final answer though, and we still need to find the principal components. If you aren't hand-rolling the eigensolver yourself there isn't a lot to be done -- you'll pay the cost for new eigenvalues and eigenvectors every time.

However, if you are writing your own eigensolver, most such methods accept a starting input and iterate till some termination condition (usually a max number of iterations or if the error becomes low enough, whichever you hit first). Swapping out a single data point isn't likely to drastically alter the principal components, so for typical data one might expect that re-using the existing eigenvalues/eigenvectors as inputs into the eigensolver would allow you to terminate in far fewer iterations than when starting from randomized inputs, affording an additional speedup.

Improving Covariance-Free Algorithms (high-dimensional inputs)

Usually (maybe always?), covariance-free PCA algorithms have some kind of iterated solver (much like an eigensolver), but they have computational shortcuts that allow finding eigenvalues/eigenvectors without explicitly materializing the covariance matrix.

Any individual such method might have additional tricks that allow you to save some information from one window to the next, but in general one would expect that you can reduce the total number of iterations simply by re-using the existing principal components instead of using random inputs to start the solver (much like in the eigensolver case above).

Window Normalization w/ Naive Algorithm

Supposing you're normalizing each window to have a mean of 0 in each column (common in PCA), you'll have some additional work when modifying the covariance matrix.

First I'll assume you already have a rolling mechanism for keeping track of any differences that need to be applied from one window to the next. If not, consider something like the following:

"""
We're lazy and don't want to handle a change in sample
size, so only work with row swaps -- good enough for
a sliding window.
old_row: shape (1, p)
new_row: shape (1, p)
"""

def replaced_row_mean_adjustment(old_row, new_row):
  return (new_row - old_row)/200.  # whatever your window size is

The effect on the covariance matrix isn't too bad to compute, but I'll put some code here anyway.

"""
W:      shape (p, p)
center: shape (1, p)
  exactly equal to the mean diff vector we referenced above
X:      shape (200, p)
  exactly equal to the window you're examining after any
  previous mean centering has been applied, but before the
  current centering has happened. Note that we only use
  its row and column sums, so you could get away with
  a rolling computation for those instead, but that's
  a little more code, and I want to leave at least some
  of the problem for you to work on
"""

def update_covariance(W, center, X):
  result = W
  result -= center.T @ np.sum(X, axis=0).reshape(1, -1)
  result -= np.sum(X, axis=1).reshape(-1, 1) @ center
  result += 200 * center.T @ center  # or whatever your window size is
  return result

Rescaling to have a standard deviation of 1 is also common in PCA. That's pretty easy to accomodate as well.

"""
Updates the covariance matrix assuming you're modifing a window
of data X with shape (200, p) by multiplying each column by
its corresponding element in v. A rolling algorithm to compute
v isn't covered here, but it shouldn't be hard to figure out.

W: shape (p, p)
v: shape (1, p)
"""

def update_covariance(W, v):
  return W * (v.T @ v)  # Note that this is element-wise multiplication of W

Window Normalization w/ Covariance-free Algorithm

The tricks that you have available here will vary quite a bit depending on the algorithm that you're using, but the general strategy I'd try first is to use a rolling algorithm to keep track of the mean and standard deviation for each column for the current window and to modify the iterative solver to take that into account (i.e., given a window X you want to iterate on the rescaled window Y -- substitute Y=a*X+b into the iterative algorithm of your choice and simplify symbolically to hopefully yield a version with a small additional constant cost).

As before you'll want to re-use any principal components you find instead of using a random initialization vector for each window.