In the original Auto-Encoding Variational Bayes paper, the authors describes the "reparameterization trick" in section 2.4. The trick is to breakup your latent state z into learnable mean and sigma (learned by the encoder) and adding Gaussian noise. You then sample a datapoint from z (basically you generate an encoded image) and let the decoder map the encoded datapoint back to the original image.
I have a hard getting over how strange this is. Could someone explain a bit more on the latent variable model, specifically:
Here is an example implementation of the latent model from here in TensorFlow.
...neural net code maps input to hidden layers z_mean and z_log_sigma
self.z_mean, self.z_log_sigma_sq = \
self._recognition_network(network_weights["weights_recog"],
network_weights["biases_recog"])
# Draw one sample z from Gaussian distribution
n_z = self.network_architecture["n_z"]
eps = tf.random_normal((self.batch_size, n_z), 0, 1,
dtype=tf.float32)
# z = mu + sigma*epsilon
self.z = tf.add(self.z_mean,
tf.mul(tf.sqrt(tf.exp(self.z_log_sigma_sq)), eps))
...neural net code maps z to output
They are not assuming that the activations of the encoder follow a gaussian distribution, they are enforcing that of the possible solutions choose a gaussian resembling one.
The image is generated from decoding a activation/feature, the activations are distributed resembling a gaussian.
They minimize the KL divergence between the activations distribution and a gaussian one.