Search code examples
deep-learningcntk

Are shared layers handled efficiently?


Are shared layers handled efficiently in CNTK? (i.e., activation computation is not duplicated)

Fro example suppose I have the following expression:

def create_func(shared_layers, out1_layers, out2_layers):

    # ... there would be a with block specifying activations...omitted for brevity
    shared_hl_func = For(shared_layers, lambda n: Dense(n), name="Shared Hidden Layers")
    out1_hl_func = For(out1_layers, lambda n: Dense(n), name="Out1 Only Hidden Layers")
    out2_hl_func = For(out2_layers, lambda n: Dense(n), name="Sigma Only Hidden Layers")

    output1_func = Sequential([shared_hl_func, out1_hl_func,
                                  Dense(1, activation=None, init=init, name="Out1_Regression_Layer")], name="Out1")
    output2_func = Sequential([shared_hl_func, out2_hl_func,
                                Dense(1, activation=None, init=init, name="Out2_Regression_Layer")], name="Out2")
    return output1_func, output2_func

output1, output2 = create_func([50,25], [25, 10], [25, 10])
my_input = cntk.input_variable((70,))
dual_model = cntk.combine(output1(my_input), output2(my_input))

When evaluating dual_model will the computation be done efficiently? (i.e., will the first two wider dense layers only be computed once and then shared? If this is not the case, then will constructing it through explicit function composition help with efficiency?


Solution

  • In your code above the shared_hl_func is evaludated independently in output1_func and output2_func, though parameter is shared. To check the computation graph, please use plot to visualize it.

    To achieve computation sharing, you need to pass the shared_hl_func output variable into output1_func and output2:

    import cntk
    from cntk.layers import *
    def create_func(shared_layers, out1_layers, out2_layers):
        shared_hl_func = For(shared_layers, lambda n: Dense(n), name="Shared Hidden Layers")
        out1_hl_func = For(out1_layers, lambda n: Dense(n), name="Out1 Only Hidden Layers")
        out2_hl_func = For(out2_layers, lambda n: Dense(n), name="Sigma Only Hidden Layers")
        out1_regr_func = Dense(1, activation=None, name="Out1_Regression_Layer")
        out2_regr_func = Dense(1, activation=None, name="Out2_Regression_Layer")
        @cntk.Function
        def _func(x):
            # ... there would be a with block specifying activations...omitted for brevity
            shared_hl = shared_hl_func(x)
            output1 = Sequential([out1_hl_func, out1_regr_func], name="Out1")(shared_hl)
            output2 = Sequential([out2_hl_func, out2_regr_func], name="Out2")(shared_hl)
            return cntk.combine(output1, output2)
        return _func
    
    output = create_func([50,25], [25, 10], [25, 10])
    my_input = cntk.input_variable((70,))
    dual_model = output(my_input)
    # use plot to visualize the model
    cntk.logging.graph.plot(dual_model, 'dual.pdf')