Search code examples
pythontensorflowmachine-learningdeep-learningtensorboard

How to detect vanishing and exploding gradients with Tensorboard?


I have two "sub-questions"

1) How can I detect vanishing or exploding gradients with Tensorboard, given the fact that currently write_grads=True is deprecated in the Tensorboard callback as per "un-deprecate write_grads for fit #31173" ?

2) I figured I can probably tell whether my model suffers from vanishing gradients based on the weights' distributions and histograms in the Distributions and Histograms tab in Tensorboard. My problem is that I have no frame of reference to compare with. Currently, my biases seem to be "moving" but I can't tell whether my kernel weights (Conv2D layers) are "moving"/"changing" "enough". Can someone help me by giving a rule of thumb to asses this visually in Tensorboard? I.e. if only the bottom 25% percentile of kernel weights are moving, that's good enough / not good enough? Or perhaps someone can post two reference images from tensorBoard of vanishing gradients vs, non vanishing gradients.

Here are my histograms and distributions, is it possible to tell whether my model suffers from Vanishing gradients? (some layers omitted for brevity) Thanks in advance.

enter image description here

enter image description here

enter image description here

enter image description here enter image description here


Solution

  • I am currently facing the same question and approached the problem similarly using Tensorboard.

    Even tho write_grads is deprecated you can still manage to log gradients for each layer of your network by subclassing the tf.keras.Model class and computing the gradients manually with gradient.Tape in the train_step method.

    Something similar to this is working for me

    from tensorflow.keras import Model
    
    class TrainWithCustomLogsModel(Model):
    
        def __init__(self, **kwargs):
            super(TrainWithCustomLogsModel, self).__init__(**kwargs)
            self.step = tf.Variable(0, dtype=tf.int64,trainable=False)
    
        def train_step(self, data):
    
            # Get batch images and labels
            x, y = data
            
            # Compute the batch loss
            with tf.GradientTape() as tape:
                p = self(x , training = True)
                loss = self.compiled_loss(y, p, regularization_losses=self.losses)
            
            # Compute gradients for each weight of the network. Note trainable_vars and gradients are list of tensors
            trainable_vars = self.trainable_variables
            gradients = tape.gradient(loss, trainable_vars)
    
            # Log gradients in Tensorboard
            self.step.assign_add(tf.constant(1, dtype=tf.int64))
            #tf.print(self.step)
            with train_summary_writer.as_default():
              for var, grad in zip(trainable_vars, gradients):
                name = var.name
                var, grad = tf.squeeze(var), tf.squeeze(grad)
                tf.summary.histogram(name, var, step = self.step)
                tf.summary.histogram('Gradients_'+name, grad, step = self.step)
        
            # Update model's weights
            self.optimizer.apply_gradients(zip(gradients, trainable_vars))
            del tape
            # Update metrics (includes the metric that tracks the loss)
            self.compiled_metrics.update_state(y, p)
            # Return a dict mapping metric names to current value
            return {m.name: m.result() for m in self.metrics}
    

    You should then be able to visualize distributions of your gradients for any train step of your training, along with distributions of your kernel's values.

    Moreover, it might be worth try to plot the distribution of the norm through time instead of single values.