Search code examples
pythontensorflowmachine-learningbayesiantensorflow-probability

What are the losses associated with the losses property of the Bayesian layers, in TensorFlow Probability?


TensorFlow Probability layers (e.g. DenseFlipout) have a losses method (or property) which gets the "losses associated with this layer." Can someone explain what these losses are?

After browsing the Flipout paper, I think the losses refer to the Kullback-Leibler divergence between the prior and posterior distributions of the weight and biases. If someone is more knowledgeable about these things than I am then please correct me.


Solution

  • Your suspicion is correct, albeit poorly documented. For example, in the piece of code below

    import tensorflow_probability as tfp
    
    model = tf.keras.Sequential([
        tfp.layers.DenseFlipout(512, activation=tf.nn.relu),
        tfp.layers.DenseFlipout(10),
    ])
    
    logits = model(features)
    neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits(
        labels=labels, logits=logits)
    
    kl = sum(model.losses) # Losses are summed
    
    # The negative log-likelihood and the KL term are combined
    loss = neg_log_likelihood + kl 
    
    train_op = tf.train.AdamOptimizer().minimize(loss)
    

    provided in the documentation of the DenseFlipout layer, the losses are summed to get the KL term, and the log-likelihood term is computed separately, and combined with the KL term to form the ELBO.

    You can see the loss being added here which, following a few indirections, reveals that the {kernel,bias}_divergence_fn is being used, and that in turn defaults to a lambda that calls tfd.kl_divergence(q, p).