machine-learning deep-learning neural-network autoencoder loss-function

Variational Autoencoders: MSE vs BCE

I'm working with a Variational Autoencoder and I have seen that there are people who uses MSE Loss and some people who uses BCE Loss, does anyone know if one is more correct that the another and why?

As far as I understand, if you assume that the latent space vector of the VAE follows a Gaussian distribution, you should use MSE Loss. If you assume it follows a multinomial distribution, you should use BCE. Also, BCE is biased towards 0.5.

Could someone clarify me this concept? I know that it's related with the Lower Variational Bound term of the expectancy of information...

Thank you so much!

Solution

In short: Maximizing likelihood of model whose prediction are normal distribution(multinomial distribution) is equivalent to minimizing MSE(BCE)

Mathematical details:

The real reason you use MSE and cross-entropy loss functions

DeepMind have an awesome lecture on Modern Latent Variable Models(Mainly about Variational Autoencoders), you can understand everything you need there