Search code examples
pythonneural-networkkerasdeep-learningkeras-layer

Constructing a keras model


I don't understand what's happening in this code:

def construct_model(use_imagenet=True):
    # line 1: how do we keep all layers of this model ?
    model = keras.applications.InceptionV3(include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3),
                                          weights='imagenet' if use_imagenet else None) # line 1: how do we keep all layers of this model ?

    new_output = keras.layers.GlobalAveragePooling2D()(model.output)

    new_output = keras.layers.Dense(N_CLASSES, activation='softmax')(new_output)
    model = keras.engine.training.Model(model.inputs, new_output)
    return model

Specifically, my confusion is, when we call the last constructor

model = keras.engine.training.Model(model.inputs, new_output)

we specify input layer and output layer, but how does it know we want all the other layers to stay?

In other words, we append the new_output layer to the pre-trained model we load in line 1, that is the new_output layer, and then in the final constructor (final line), we just create and return a model with a specified input and output layers, but how does it know what other layers we want in between?

Side question 1): What is the difference between keras.engine.training.Model and keras.models.Model?

Side question 2): What exactly happens when we do new_layer = keras.layers.Dense(...)(prev_layer)? Does the () operation return new layer, what does it do exactly?


Solution

  • This model was created using the Functional API Model

    Basically it works like this (perhaps if you go to the "side question 2" below before reading this it may get clearer):

    • You have an input tensor (you can see it as "input data" too)
    • You create (or reuse) a layer
    • You pass the input tensor to a layer (you "call" a layer with an input)
    • You get an output tensor

    You keep working with these tensors until you have created the entire graph.

    But this hasn't created a "model" yet. (One you can train and use other things).
    All you have is a graph telling which tensors go where.

    To create a model, you define it's start end end points.


    In the example.

    • They take an existing model: model = keras.applications.InceptionV3(...)
    • They want to expand this model, so they get its output tensor: model.output
    • They pass this tensor as the input of a GlobalAveragePooling2D layer
    • They get this layer's output tensor as new_output
    • They pass this as input to yet another layer: Dense(N_CLASSES, ....)
    • And get its output as new_output (this var was replaced as they are not interested in keeping its old value...)

    But, as it works with the functional API, we don't have a model yet, only a graph. In order to create a model, we use Model defining the input tensor and the output tensor:

    new_model = Model(old_model.inputs, new_output)    
    

    Now you have your model.
    If you use it in another var, as I did (new_model), the old model will still exist in model. And these models are sharing the same layers, in a way that whenever you train one of them, the other gets updated as well.


    Question: how does it know what other layers we want in between?

    When you do:

    outputTensor = SomeLayer(...)(inputTensor)    
    

    you have a connection between the input and output. (Keras will use the inner tensorflow mechanism and add these tensors and nodes to the graph). The output tensor cannot exist without the input. The entire InceptionV3 model is connected from start to end. Its input tensor goes through all the layers to yield an ouptut tensor. There is only one possible way for the data to follow, and the graph is the way.

    When you get the output of this model and use it to get further outputs, all your new outputs are connected to this, and thus to the first input of the model.

    Probably the attribute _keras_history that is added to the tensors is closely related to how it tracks the graph.

    So, doing Model(old_model.inputs, new_output) will naturally follow the only way possible: the graph.

    If you try doing this with tensors that are not connected, you will get an error.


    Side question 1

    Prefer to import from "keras.models". Basically, this module will import from the other module:

    Notice that the file keras/models.py imports Model from keras.engine.training. So, it's the same thing.

    Side question 2

    It's not new_layer = keras.layers.Dense(...)(prev_layer).

    It is output_tensor = keras.layers.Dense(...)(input_tensor).

    You're doing two things in the same line:

    • Creating a layer - with keras.layers.Dense(...)
    • Calling the layer with an input tensor to get an output tensor

    If you wanted to use the same layer with different inputs:

    denseLayer = keras.layers.Dense(...) #creating a layer
    
    output1 = denseLayer(input1)  #calling a layer with an input and getting an output
    output2 = denseLayer(input2)  #calling the same layer on another input
    output3 = denseLayer(input3)  #again   
    

    Bonus - Creating a functional model that is equal to a sequential model

    If you create this sequential model:

    model = Sequential()
    model.add(Layer1(...., input_shape=some_shape))   
    model.add(Layer2(...))
    model.add(Layer3(...))
    

    You're doing exactly the same as:

    inputTensor = Input(some_shape)
    outputTensor = Layer1(...)(inputTensor)
    outputTensor = Layer2(...)(outputTensor)    
    outputTensor = Layer3(...)(outputTensor)
    
    model = Model(inputTensor,outputTensor)
    

    What is the difference?

    Well, functional API models are totally free to be build anyway you want. You can create branches:

    out1 = Layer1(..)(inputTensor)    
    out2 = Layer2(..)(inputTensor)
    

    You can join tensors:

    joinedOut = Concatenate()([out1,out2])   
    

    With this, you can create anything you want with all kinds of fancy stuff, branches, gates, concatenations, additions, etc., which you can't do with a sequential model.

    In fact, a Sequential model is also a Model, but created for a quick use in models without branches.