Search code examples
tensorflowmachine-learningkerasdeep-learningloss-function

Multiple loss functions on (somewhat) overlapping sub-models in Keras


I have a model in Keras where I would like to use two loss functions. The model consists of an autoencoder and a classifier on top of it. I would like to have one loss function that makes sure the autoencoder is fitted reasonably well (for example, it can be mse) and another loss function that evaluates the classifier (for example, categorical_crossentropy). I would like to fit my model and use a loss function that would be a linear combination of the two loss functions.

# loss functions
def ae_mse_loss(x_true, x_pred):
    ae_loss = K.mean(K.square(x_true - x_pred), axis=1)
    return ae_loss

def clf_loss(y_true, y_pred):
    return K.sum(K.categorical_crossentropy(y_true, y_pred), axis=-1)

def combined_loss(y_true, y_pred):
    ???
    return ae_loss + w1*clf_loss

where w1 is some weight that defines "importance of clf_loss" in the final combined loss.


# autoencoder
ae_in_layer = Input(shape=in_dim, name='ae_in_layer')
ae_interm_layer1 = Dense(interm_dim, activation='relu', name='ae_interm_layer1')(ae_in_layer)
ae_mid_layer = Dense(latent_dim, activation='relu', name='ae_mid_layer')(ae_interm_layer1)
ae_interm_layer2 = Dense(interm_dim, activation='relu', name='ae_interm_layer2')(ae_mid_layer)
ae_out_layer = Dense(in_dim, activation='linear', name='ae_out_layer')(ae_interm_layer2)

ae_model=Model(ae_input_layer, ae_out_layer)
ae_model.compile(optimizer='adam', loss = ae_mse_loss)

# classifier
clf_in_layer = Dense(interm_dim, activation='sigmoid', name='clf_in_layer')(ae_out_layer)
clf_out_layer = Dense(3, activation='softmax', name='clf_out_layer')(clf_in_layer)

clf_model = Model(clf_in_layer, clf_out_layer)
clf_model.compile(optimizer='adam', loss = combined_loss, metrics = [ae_mse_loss, clf_loss])

What I'm not sure about is how to distinguish y_true and y_pred in the two loss functions (since they refer to true and predicted data at different stages in the model). What I had in mind is something like this (I'm not sure how to implement it since obviously I need to pass only one set of arguments y_true & y_pred):

def combined_loss(y_true, y_pred):
    ae_loss = ae_mse_loss(x_true_ae, x_pred_ae)
    clf_loss = clf_loss(y_true_clf, y_pred_clf)
    return ae_loss + w1*clf_loss

I could define this problem as two separate models and train each model separately but I would really prefer if I could do this all at once if possible (since it would optimize both problems simultaneously). I realize, this model doesn't make much sense but it demonstrates the (much more complicated) problem I'm trying to solve in a simple way.

Any suggestions would be appreciated.


Solution

  • All you need is simply available in native keras

    you can automatically combine multiple losses using loss_weights parameter

    In the example below I tried to reproduce your example where I combined an mse loss for the regression task and a categorical_crossentropy for the classification task

    in_dim = 10
    interm_dim = 64
    latent_dim = 32
    n_class = 3
    n_sample = 100
    
    X = np.random.uniform(0,1, (n_sample,in_dim))
    y = tf.keras.utils.to_categorical(np.random.randint(0,n_class, n_sample))
    
    # autoencoder
    ae_in_layer = Input(shape=in_dim, name='ae_in_layer')
    ae_interm_layer1 = Dense(interm_dim, activation='relu', name='ae_interm_layer1')(ae_in_layer)
    ae_mid_layer = Dense(latent_dim, activation='relu', name='ae_mid_layer')(ae_interm_layer1)
    ae_interm_layer2 = Dense(interm_dim, activation='relu', name='ae_interm_layer2')(ae_mid_layer)
    ae_out_layer = Dense(in_dim, activation='linear', name='ae_out_layer')(ae_interm_layer2)
    
    # classifier
    clf_in_layer = Dense(interm_dim, activation='sigmoid', name='clf_in_layer')(ae_out_layer)
    clf_out_layer = Dense(n_class, activation='softmax', name='clf_out_layer')(clf_in_layer)
    
    model = Model(ae_in_layer, [ae_out_layer,clf_out_layer])
    model.compile(optimizer='adam', 
                  loss = {'ae_out_layer':'mse', 'clf_out_layer':'categorical_crossentropy'},
                  loss_weights = {'ae_out_layer':1., 'clf_out_layer':0.5})
    
    model.fit(X, [X,y], epochs=10)
    

    In this specific case, the loss is the result of 1*ae_out_layer_loss + 0.5*clf_out_layer_loss