Search code examples
pythonpython-3.xtensorflowtensorboardobject-detection-api

Tensorflow graph nodes are exchange


I have trained a model with fine-tuning pre-trained model ssd_mobilenet_v2_coco_2018. Here, I have used the exact same pipeline.config file for training which is available inside ssd_mobilenet_v2_coco_2018 pre-trained folder. I have only removed the batch_norm_trainable: true flag and changed the number of classes (4). After training the model with my custom datasets with 4 classes, I found concat and concat_1 nodes get exchange with each other. Pre-trained model has | concat | 1x1917x1x4 | after-training it becomes | concat | 1x1917x5 | I have attached both tensorboard graph visualisation images. First image is pre-trained graph ssd_mobilenet_v2_coco_2018. enter image description here enter image description here

The node exchanges can be seen on the rightmost corner of the image. As in the pre-trained graph, Postprocess layer connect with concat_1 and Squeeeze connect with concat. But after the training, the graph shows completely reverse. Like Prosprocess layer connect with concat and Squeeeze connect with concat_1. Further, I also found in the pre-trained model graph that the Preprocessor takes input ToFloat while after training the graph shows Cast as an input to Preprocessor. I have fed the input to the model as tfrecords.


Solution

  • Most probably, the difference is not in the graph, but simply in the names of the nodes, i.e. nodes concat and concat_1 on the left are the same nodes as resp. concat_1 and concat on the right.

    The thing is, when you don't provide an explicit name to a node, tensorflow needs to come up with one, and it's naming convention is rather uninventive. The first time it needs to name a node, it does so with its type. When it encounter the situation again, it simply add _ + an increasing number to the name.

    Take this example:

    import tensorflow as tf
    
    x = tf.placeholder(tf.float32, (1,), name='x')
    y = tf.placeholder(tf.float32, (1,), name='y')
    z = tf.placeholder(tf.float32, (1,), name='z')
    
    xy = tf.concat([x, y], axis=0)  # named 'concat'
    xz = tf.concat([x, z], axis=0)  # named 'concat_1'
    

    The graph looks like this:

    enter image description here

    Now if we construct the same graph, but this time creating xz before xy, we get the following graph:

    enter image description here

    So the graph did not really change -- only the names did. This is probably what happened in your case: the same operations were created but not in the same order.

    The fact that names changed for stateless nodes like concat is unimportant, because no weights will be misrouted when loading a saved model for example. Nonetheless, if naming stability is important for you, you could either give explicit names to your operations or place them in distinct scopes:

    xy = tf.concat([x, y], axis=0, name='xy')
    xz = tf.concat([x, z], axis=0, name='xz')
    

    enter image description here

    It is much more problematic if variables switch name. This is one of the reason why tf.get_variable -- which forces variables to have a name and raises an error when a name conflict occurs -- was the preferred way of dealing with variables in the pre-TF2 era.