machine-learning deep-learning keras concatenation models

Keras: connecting two layers from different models to create new model

What I am trying to do:
I want to connect any Layers from different models to create a new keras model.

What I found so far:
https://github.com/keras-team/keras/issues/4205: using the Model's call class to change the input of another model. My problems with this approach:

Can only change the input of the Model, no other layers. So if I want to cut off some layers at the beginning of the encoder, that is not possible
Not a fan of the nested array structure when getting the config file. Would prefer to have a 1D-array
When using model.summary() or plot_model(), the encoder only shows as "Model". If anything I would say both models should be wrapped. So the config should show [model_base, model_encoder] and not [base_input, base_conv2D, ..., encoder_model]
To be fair, with this approach: https://github.com/keras-team/keras/issues/3021, the point above is actually possible, but again, it is very inflexible. As soon as I want to cut off some layers at the top or bottom of the base or encoder network, this approach fails

https://github.com/keras-team/keras/issues/3465: Adding new layers to a base model by using any output of the base model. Problems here:

While it is possible to use any layer from the base model, which means I can cut off layers from the base model, I can not load the encoder as a keras model. The top models always must be created new.

What I have tried:
My approach to connecting any layers from different models:

Clear inbound nodes of input layer
use the call() method of the output layer with the tensor of the output layer
Clean up the outbound nodes of the output tensor by switching out the new created tensor with the previous output tensor

I was really optimistic at first, as the summary() and the plot_model() got me exactly what I wanted, thus the Node graph should be fine, right? But I ran into errors when training. While the approach in the "What I found so far" section trained fine, I ran into an error with my approach. This is the error message:

  File "C:\Anaconda\envs\dlpipe\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 508, in apply_op
    (input_name, err))
ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.

Might be an important info, that I am using Tensorflow as backend. I was able to trace back the root of this error. It seems like there is an error when the gradients are calculated. Usually, there is a gradient calculation for each node, but all the nodes of the base network have "None" when using my approach. So basically in keras/optimizers.py, get_updates() when the gradients are calculated (grad = self.get_gradients(loss, params)).

Here is the code (without the training), with all three approaches implemented:

def create_base():
    in_layer = Input(shape=(32, 32, 3), name="base_input")
    x = Conv2D(32, (3, 3), padding='same', activation="relu", name="base_conv2d_1")(in_layer)
    x = Conv2D(32, (3, 3), padding='same', activation="relu", name="base_conv2d_2")(x)
    x = MaxPooling2D(pool_size=(2, 2), name="base_maxpooling_2d_1")(x)
    x = Dropout(0.25, name="base_dropout")(x)

    x = Conv2D(64, (3, 3), padding='same', activation="relu", name="base_conv2d_3")(x)
    x = Conv2D(64, (3, 3), padding='same', activation="relu", name="base_conv2d_4")(x)
    x = MaxPooling2D(pool_size=(2, 2), name="base_maxpooling2d_2")(x)
    x = Dropout(0.25, name="base_dropout_2")(x)

    return Model(inputs=in_layer, outputs=x, name="base_model")

def create_encoder():
    in_layer = Input(shape=(8, 8, 64))
    x = Flatten(name="encoder_flatten")(in_layer)
    x = Dense(512, activation="relu", name="encoder_dense_1")(x)
    x = Dropout(0.5, name="encoder_dropout_2")(x)
    x = Dense(10, activation="softmax", name="encoder_dense_2")(x)
    return Model(inputs=in_layer, outputs=x, name="encoder_model")

def extend_base(input_model):
    x = Flatten(name="custom_flatten")(input_model.output)
    x = Dense(512, activation="relu", name="custom_dense_1")(x)
    x = Dropout(0.5, name="custom_dropout_2")(x)
    x = Dense(10, activation="softmax", name="custom_dense_2")(x)
    return Model(inputs=input_model.input, outputs=x, name="custom_edit")

def connect_layers(from_tensor, to_layer, clear_inbound_nodes=True):
    try:
        tmp_output = to_layer.output
    except AttributeError:
        raise ValueError("Connecting to shared layers is not supported!")

    if clear_inbound_nodes:
        to_layer.inbound_nodes = []
    else:
        tensor_list = to_layer.inbound_nodes[0].input_tensors
        tensor_list.append(from_tensor)
        from_tensor = tensor_list
        to_layer.inbound_nodes = []
    new_output = to_layer(from_tensor)
    for out_node in to_layer.outbound_nodes:
        for i, in_tensor in enumerate(out_node.input_tensors):
            if in_tensor == tmp_output:
                out_node.input_tensors[i] = new_output


if __name__ == "__main__":
    base = create_base()
    encoder = create_encoder()

    #new_model_1 = Model(inputs=base.input, outputs=encoder(base.output))
    #plot_model(new_model_1, to_file="plots/new_model_1.png")

    new_model_2 = extend_base(base)
    plot_model(new_model_2, to_file="plots/new_model_2.png")
    print(new_model_2.summary())

    base_layer = base.get_layer("base_dropout_2")
    top_layer = encoder.get_layer("encoder_flatten")
    connect_layers(base_layer.output, top_layer)
    new_model_3 = Model(inputs=base.input, outputs=encoder.output)
    plot_model(new_model_3, to_file="plots/new_model_3.png")
    print(new_model_3.summary())

I know this is a lot of text and a lot of code. But I feel like it is needed to explain the issue here.

EDIT: I just tried thenao and I think the error gives away more information:

theano.gradient.DisconnectedInputError:  
Backtrace when that variable is created:

It seems like every layer from the encoder model has some connection with the encoder input layer via TensorVariables.

Solution

So this is what I ended up with for the connect_layer() function:

def connect_layers(from_tensor, to_layer, old_tensor=None):
    # if there is any shared layer after the to_layer, it is not supported
    try:
        tmp_output = to_layer.output
    except AttributeError:
        raise ValueError("Connecting to shared layers is not supported!")
    # check if to_layer has multiple input_tensors, and therefore some sort of merge layer
    if len(to_layer.inbound_nodes[0].input_tensors) > 1:
        tensor_list = to_layer.inbound_nodes[0].input_tensors
        found_tensor = False
        for i, tensor in enumerate(tensor_list):
            # exchange the old tensor with the new created tensor
            if tensor == old_tensor:
                tensor_list[i] = from_tensor
                found_tensor = True
                break
        if not found_tensor:
            tensor_list.append(from_tensor)
        from_tensor = tensor_list
        to_layer.inbound_nodes = []
    else:
        to_layer.inbound_nodes = []

    new_output = to_layer(from_tensor)

    tmp_out_nodes = to_layer.outbound_nodes[:]
    to_layer.outbound_nodes = []
    # recursively connect all layers after the current to_layer 
    for out_node in tmp_out_nodes:
        l = out_node.outbound_layer
        print("Connecting: " + str(to_layer) + " ----> " + str(l))
        connect_layers(new_output, l, tmp_output)

As each Tensor has all the information about it's root tensor via -> owner.inputs -> owner.inputs -> ..., all tensor following the new_output tensor must be updated.
It was a lot easier to debug that with theano then with tensorflow backend.

I still need to figure out how to deal with shared layers. With the current implementation it is not possible to connect other models that contain a shared layer after the first to_layer.