Search code examples
pythontensorflowkerastensorboardbatch-normalization

Keras - First batch norm layer shown as input to every other batch norm layer in tensorboard, why is this the case?


Sorry if this is something obvious that I'm missing, I've tried /r/learnmachinelearning and they're of no help so I thought I'd try here.

I am using this:

    def cnn_block_1(inp, filt, kernal, b_num):
        # Layer Names
        c1_left, b1_left, a1_left, c2_left, b2_left, a2_left, c3_left, b3_left, a3_left, p_left = \
        "c1_left_" + b_num, "b1_left_" + b_num, "a1_left_" + b_num, "c2_left_" + b_num, \
        "b2_left_" + b_num, "a2_left_" + b_num, "c3_left_" + b_num, "b3_left_" + b_num, \
        "a3_left_" + b_num, "p_left_" + b_num,

        # Block
        c1_l = Conv2D(filters=filt, kernel_size=kernal, padding="same", name=c1_left)(inp)
        bn1_l = BatchNormalization(name=b1_left)(c1_l)
        a1_l = Activation("relu", name=a1_left)(bn1_l)
        c2_l = Conv2D(filters=filt, kernel_size=kernal, padding="same", name=c2_left)(a1_l)
        bn2_l = BatchNormalization(name=b2_left)(c2_l)
        a2_l = Activation("relu", name=a2_left)(bn2_l)
        c3_l = Conv2D(filters=filt, kernel_size=kernal, padding="same", name=c3_left)(a2_l)
        bn3_l = BatchNormalization(name=b3_left)(c3_l)
        a3_l = Activation("relu", name=a3_left)(bn3_l)
        p_l = MaxPooling2D(padding="same", name=p_left)(a3_l)

        return p_l

    left_arm_blocks = 6
    filter_value = 2

    x1 = Sequential()
    x1.add(Embedding(vocab_char_size, embedding_char_size, input_length=maxlen, mask_zero=True,         weights=[e_char], name='embedding_1', trainable=False))
    x1.add(Lambda(lambda xo: K.expand_dims(xo, axis=3)))

    x2 = Sequential()
    x2.add(Embedding(vocab_word_size, embedding_word_size, input_length=maxlen, mask_zero=True,     weights=[e_word], name='embedding_2', trainable=False))
    x2.add(Lambda(lambda xo: K.expand_dims(xo, axis=3)))

    c = Concatenate(axis=3)([x1.output, x2.output])
    left_string = list()
    left_string.append(c)
    f_value = filter_value
    for i in range(left_arm_blocks):
        c = cnn_block_1(left_string[-1], f_value, kernal_value, str(i))
        left_string.append(c)
        f_value *= 2

    x = Lambda(lambda xq: xq, output_shape=lambda s: s)(left_string[-1])
    flat1 = Flatten()(x)
    #etc....

To string together a bunch of predefined CNN blocks. I am saving the output of each call to the function in a list and using the last output in the list as the input for the next layer etc. (I originally was just using the previous output as an input but made a list so I could ensure I wasn't going crazy in that capacity)

When I load the model on tensorboard to have a look at the architecture though, something bizarre is happening: https://i.sstatic.net/vJe0U.png

Here's that node expanded: https://i.sstatic.net/fTW1e.png & closer up: https://i.sstatic.net/BBGzl.png

This one with no function and just the CNN layers: https://i.sstatic.net/RxfkM.png

For some reason it shows that the first batch norm layer or "b1_left_0" is being used as an input to every other batch norm layer in my entire model, including an entire other "right" arm of the model that is only connected to this via a concatenate layer much later on.

I'm assuming I'm missing something obvious here and am being dumb but I'm at a loss for how to try to work this out further as everything in my code seems to be working as intended.

Thanks in advance for any advice.


Solution

  • The graph is correct - that's how Keras represents certain operations like batch norm.

    It creates a node in the graph that performs the operation (it keeps the first one encountered, the b1_left_0 in your case) and references is in every other node that performs the same operation. The tensorboard visualization is not suited for a graph created with keras, hence in your case is better to refer to the model.summary() method to see if keras built the graph correctly.