Sequential calls to predict using VFE and SVGP models

I have an example where I have to call predict sequentially. Why would SVGP predictions be faster then SGPR in that case? Isn't the time complexity with prediction the same for all sparse models or am I missing something? This is the code I ran for testing it:

from numpy.random import randn
import gpflow

sgpr = gpflow.models.SGPR(X=randn(70000,8), Y=randn(70000,1), kern=gpflow.kernels.SquaredExponential(8), Z=randn(200,8))
sgpr.predict_y(randn(1,8))
%timeit -n 100 -r 7 sgpr.predict_y(randn(1,8))
>>> 128 ms ± 696 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

svgp = gpflow.models.SVGP(X=randn(70000,8), Y=randn(70000,1), kern=gpflow.kernels.SquaredExponential(8), likelihood=gpflow.likelihoods.Gaussian(), Z=randn(200,8))
svgp.predict_y(randn(1,8))
%timeit -n 100 -r 7 svgp.predict_y(randn(1,8))
>>> 6.61 ms ± 913 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Solution

This answer is the result from https://github.com/GPflow/GPflow/issues/1030. SGPR computes the predictive posterior exactly from the whole dataset in each iteration, whereas SVGP stores that information in the much smaller q(u) distribution. A workaround is to implement a custom prediction method where the training dependent matrices are precomputed and then reused in each iteration.