Are shared layers handled efficiently?

Are shared layers handled efficiently in CNTK? (i.e., activation computation is not duplicated)

Fro example suppose I have the following expression:

def create_func(shared_layers, out1_layers, out2_layers):

    # ... there would be a with block specifying activations...omitted for brevity
    shared_hl_func = For(shared_layers, lambda n: Dense(n), name="Shared Hidden Layers")
    out1_hl_func = For(out1_layers, lambda n: Dense(n), name="Out1 Only Hidden Layers")
    out2_hl_func = For(out2_layers, lambda n: Dense(n), name="Sigma Only Hidden Layers")

    output1_func = Sequential([shared_hl_func, out1_hl_func,
                                  Dense(1, activation=None, init=init, name="Out1_Regression_Layer")], name="Out1")
    output2_func = Sequential([shared_hl_func, out2_hl_func,
                                Dense(1, activation=None, init=init, name="Out2_Regression_Layer")], name="Out2")
    return output1_func, output2_func

output1, output2 = create_func([50,25], [25, 10], [25, 10])
my_input = cntk.input_variable((70,))
dual_model = cntk.combine(output1(my_input), output2(my_input))

When evaluating dual_model will the computation be done efficiently? (i.e., will the first two wider dense layers only be computed once and then shared? If this is not the case, then will constructing it through explicit function composition help with efficiency?

Solution

In your code above the shared_hl_func is evaludated independently in output1_func and output2_func, though parameter is shared. To check the computation graph, please use plot to visualize it.

To achieve computation sharing, you need to pass the shared_hl_func output variable into output1_func and output2:

import cntk
from cntk.layers import *
def create_func(shared_layers, out1_layers, out2_layers):
    shared_hl_func = For(shared_layers, lambda n: Dense(n), name="Shared Hidden Layers")
    out1_hl_func = For(out1_layers, lambda n: Dense(n), name="Out1 Only Hidden Layers")
    out2_hl_func = For(out2_layers, lambda n: Dense(n), name="Sigma Only Hidden Layers")
    out1_regr_func = Dense(1, activation=None, name="Out1_Regression_Layer")
    out2_regr_func = Dense(1, activation=None, name="Out2_Regression_Layer")
    @cntk.Function
    def _func(x):
        # ... there would be a with block specifying activations...omitted for brevity
        shared_hl = shared_hl_func(x)
        output1 = Sequential([out1_hl_func, out1_regr_func], name="Out1")(shared_hl)
        output2 = Sequential([out2_hl_func, out2_regr_func], name="Out2")(shared_hl)
        return cntk.combine(output1, output2)
    return _func

output = create_func([50,25], [25, 10], [25, 10])
my_input = cntk.input_variable((70,))
dual_model = output(my_input)
# use plot to visualize the model
cntk.logging.graph.plot(dual_model, 'dual.pdf')