I am trying to run a GP regression over 2D space + 1D time with ~8000 observations and a composite kernel with 4 Matern 3/2 covariance functions -- more than a single core can handle.
It would be great to be able to distribute the GPR computation over multiple nodes rather than having to resort to variational GP. This github issue explains how to execute multithreading in GPflow 1.0, but I am not looking for a way to parallelize many predict_f
calls.
Rather, I want to do GPR on a large dataset, which means inverting a covariance matrix larger than a single core can handle. Is there a way to parallelize this computation for a cluster or the Cloud?
In terms of computation, the GPflow can do whatever TensorFlow does. In other words, if TensorFlow supported cloud evaluations, the GPflow would support it as well. But, it doesn't mean that you cannot implement your version of TensorFlow computation, maybe more efficient and be able to run it on the cloud. You can start looking into TensorFlow custom ops: https://www.tensorflow.org/guide/create_op.
The linalg operations, like Cholesky, are hardly parallelisable and benefit of time-saving from it would be questionable. Although memory-wise the advantage of cluster computing is obvious.