Questions about Bayesian GP-LVM implementation details

I want to understand how the Bayesian GPLVM implementation works in GPflow, but I am struggling with a few lines of the code. I would greatly appreciate any help me with the following questions:

I understand that matrix B in line 178 of gplvm.py:

B = AAT + tf.eye(num_inducing, dtype=default_float())

corresponds to $\beta\Psi_2 + K_{MM}$ in Eq. 14 of Titsias and Lawrence 2010. However, I don't understand how the code implements this expression.

Related to the previous question, I cannot understand what A, tmp, AAT and c mean in lines 175-181 of gplvm.py?

A = tf.linalg.triangular_solve(L, tf.transpose(psi1), lower=True) / sigma
tmp = tf.linalg.triangular_solve(L, psi2, lower=True)
AAT = tf.linalg.triangular_solve(L, tf.transpose(tmp), lower=True) / sigma2
B = AAT + tf.eye(num_inducing, dtype=default_float())
LB = tf.linalg.cholesky(B)
log_det_B = 2.0 * tf.reduce_sum(tf.math.log(tf.linalg.diag_part(LB)))
c = tf.linalg.triangular_solve(LB, tf.linalg.matmul(A, Y_data), lower=True) / sigma

I am guessing the code is using the matrix inversion lemma, but I cannot see how.

In Eq. 14 from Titsias and Lawrence 2010, there are three terms that I cannot understand how they are calculated in gplvm.py:
- 0.5 \beta^2 y_d^T \Psi_1 (\beta\Psi_2+K_{MM})^{-1} \Psi_1^T y_d (this formula appears in the expression of W below Eq. 14)
- 0.5 D \beta Tr(K_{MM}^{-1} \Psi_2)
- 0.5 D \log |K_{MM}|

I would greatly appreciate any hint.

Cordially, Joaquin

Solution

The code to compute the elbo gplvm.py is very elegant and efficient. In case anyone wants to understand it, I respond to my previous questions below and I posted further notes.

I understand that matrix B in line gplvm.py:182:

B = AAT + tf.eye(num_inducing, dtype=default_float())

corresponds to $\beta\Psi_2 + K_{MM}$ in Eq. 14 of Titsias and Lawrence 2010. However, I don't understand how the the gplvm code implements the expression in the paper.

Call matrix $\beta\Psi_2 + K_{MM}$ in Eq. 14 of Titsias and Lawrence 2010 (i.e., TL10) as D. In gplvm.py this matrix is calculated as D=LBL where B is the matrix given above (i.e., B=AAT+I) and L is the Choleskly factor of K_{MM}.

Related to the previous question, I cannot understand what A, tmp, AAT and c mean in the code?

A = tf.linalg.triangular_solve(L, tf.transpose(psi1), lower=True) / sigma tmp = tf.linalg.triangular_solve(L, psi2, lower=True) AAT = tf.linalg.triangular_solve(L, tf.transpose(tmp), lower=True) / sigma2 B = AAT + tf.eye(num_inducing, dtype=default_float()) LB = tf.linalg.cholesky(B) log_det_B = 2.0 * tf.reduce_sum(tf.math.log(tf.linalg.diag_part(LB))) c = tf.linalg.triangular_solve(LB, tf.linalg.matmul(A, Y_data), lower=True) / sigma

I am guessing the code is using the matrix inversion lemma, but I cannot see how.

The code does not use the matrix inversion lemma.

The data term in Eq. 14 of TL10, (i.e., the term in the exponential) is computed by taking the square of the norm2 of the vector c.

AAT is the matrix that appears inside the trace of the last term in Eq.~14 in TL10 (i.e., $K_{MM}^{-1)\Psi_2)$).

In Eq. 14 from Titsias and Lawrence, 2010, there are three terms that I cannot understand how they are calculated: >

0.5 \beta^2 y_d^T \Psi_1 (\beta\Psi_2+K_{MM})^{-1} \Psi_1^T y_d

0.5 D \beta Tr(K_{MM}^{-1} \Psi_2)

0.5 D \log |K_{MM}|

As mentioned above, the first term is calculated by taking the square of the norm2 of vector c and the second term by taking the trace of AAT. The subtraction of the two log determinants in Eq. 14 of TL10 (and the third term) is calculated by taking log |B|.

Beautiful piece of code. Thanks.