Search code examples
pythontensorflowkerasloss-function

How to replace loss function during training tensorflow.keras


I want to replace the loss function related to my neural network during training, this is the network:

model = tensorflow.keras.models.Sequential()
        model.add(tensorflow.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=input_shape))
        model.add(tensorflow.keras.layers.Conv2D(64, (3, 3), activation="relu"))
        model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
        model.add(tensorflow.keras.layers.Dropout(0.25))
        model.add(tensorflow.keras.layers.Flatten())
        model.add(tensorflow.keras.layers.Dense(128, activation="relu"))
        model.add(tensorflow.keras.layers.Dropout(0.5))
        model.add(tensorflow.keras.layers.Dense(output_classes, activation="softmax"))
        model.compile(loss=tensorflow.keras.losses.categorical_crossentropy, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
        history = model.fit(x_train, y_train, batch_size=128, epochs=5, validation_data=(x_test, y_test))

so now I want to change tensorflow.keras.losses.categorical_crossentropy with another, so I made this:

model.compile(loss=tensorflow.keras.losses.mse, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
    history = model.fit(x_improve, y_improve, epochs=1, validation_data=(x_test, y_test)) #FIXME bug during training

but I have this error:

ValueError: No gradients provided for any variable: ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0'].

Why? How can I fix it? There is another way to change loss function?

Thanks


Solution

  • So, a straightforward answer I would give is: switch to pytorch if you want to play this kind of games. Since in pytorch you define your training and evaluation functions, it takes just an if statement to switch from a loss function to another one.

    Also, I see in your code that you want to switch from cross_entropy to mean_square_error, the former is suitable for classification the latter for regression, so this is not really something you can do, in the code that follows I switched from mean squared error to mean squared logarithmic error, which are both loss suitable for regression.

    Despite other answers offers solutions to your question (see change-loss-function-dynamically-during-training) it is not clear wether you can trust or not the results. Some people found that even with a customised function sometimes Keras keep training with the first loss.

    Solution:

    My solution is based on train_on_batch, which allows us to train a model in a for loop and therefore stop training it whenever we prefer to recompile the model with a new loss function. Please note that recompiling the model does not reset the weights (see:Does recompiling a model re-initialize the weights?).

    The dataset can be found here Boston housing dataset

    # Regression Example With Boston Dataset: Standardized and Larger
    from pandas import read_csv
    from keras.models import Sequential
    from keras.layers import Dense
    from sklearn.model_selection import train_test_split
    from keras.losses import mean_squared_error, mean_squared_logarithmic_error
    from matplotlib import pyplot
    import matplotlib.pyplot as plt
    
    # load dataset
    dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
    dataset = dataframe.values
    
    # split into input (X) and output (Y) variables
    X = dataset[:,0:13]
    y = dataset[:,13]
    
    trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.33, random_state=42)
    
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(6, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    
    batch_size = 25
    
    # have to define manually a dict to store all epochs scores 
    history = {}
    history['history'] = {}
    history['history']['loss'] = []
    history['history']['mean_squared_error'] = []
    history['history']['mean_squared_logarithmic_error'] = []
    history['history']['val_loss'] = []
    history['history']['val_mean_squared_error'] = []
    history['history']['val_mean_squared_logarithmic_error'] = []
    
    # first compiling with mse
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
    
    # define number of iterations in training and test
    train_iter = round(trainX.shape[0]/batch_size)
    test_iter = round(testX.shape[0]/batch_size)
    
    for epoch in range(2):
        
        # train iterations 
        loss, mse, msle = 0, 0, 0
        for i in range(train_iter):
            
            start = i*batch_size
            end = i*batch_size + batch_size
            batchX = trainX[start:end,]
            batchy = trainy[start:end,]
            
            loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
                    
            loss += loss_
            mse += mse_
            msle += msle_
        
        history['history']['loss'].append(loss/train_iter)
        history['history']['mean_squared_error'].append(mse/train_iter)
        history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)
        
        # test iterations 
        val_loss, val_mse, val_msle = 0, 0, 0
        for i in range(test_iter):
            
            start = i*batch_size
            end = i*batch_size + batch_size
            batchX = testX[start:end,]
            batchy = testy[start:end,]
            
            val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
            
            val_loss += val_loss_
            val_mse += val_mse_
            val_msle += msle_
            
        history['history']['val_loss'].append(val_loss/test_iter)
        history['history']['val_mean_squared_error'].append(val_mse/test_iter)
        history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)
        
    # recompiling the model with new loss
    model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
    
    for epoch in range(2):
        
        # train iterations 
        loss, mse, msle = 0, 0, 0
        for i in range(train_iter):
            
            start = i*batch_size
            end = i*batch_size + batch_size
            batchX = trainX[start:end,]
            batchy = trainy[start:end,]
        
            loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
            
            loss += loss_
            mse += mse_
            msle += msle_
            
        history['history']['loss'].append(loss/train_iter)
        history['history']['mean_squared_error'].append(mse/train_iter)
        history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)
         
        # test iterations 
        val_loss, val_mse, val_msle = 0, 0, 0
        for i in range(test_iter):
            
            start = i*batch_size
            end = i*batch_size + batch_size
            batchX = testX[start:end,]
            batchy = testy[start:end,]
            
            val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
            
            val_loss += val_loss_
            val_mse += val_mse_
            val_msle += msle_
            
        history['history']['val_loss'].append(val_loss/test_iter)
        history['history']['val_mean_squared_error'].append(val_mse/test_iter)
        history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)
        
    # Some plots to check what is going on   
    # loss function 
    pyplot.subplot(311)
    pyplot.title('Loss')
    pyplot.plot(history['history']['loss'], label='train')
    pyplot.plot(history['history']['val_loss'], label='test')
    pyplot.legend()
    
    # Only mean squared error 
    pyplot.subplot(312)
    pyplot.title('Mean Squared Error')
    pyplot.plot(history['history']['mean_squared_error'], label='train')
    pyplot.plot(history['history']['val_mean_squared_error'], label='test')
    pyplot.legend()
    
    # Only mean squared logarithmic error 
    pyplot.subplot(313)
    pyplot.title('Mean Squared Logarithmic Error')
    pyplot.plot(history['history']['mean_squared_logarithmic_error'], label='train')
    pyplot.plot(history['history']['val_mean_squared_logarithmic_error'], label='test')
    pyplot.legend()
    plt.tight_layout()
    pyplot.show()
    

    The resulting plot confirm that the loss function is changing after the second epoch:

    enter image description here

    The drop in the loss function is due to the fact that the model is switching from normal mean squared error to the logarithmic one, which has much lower values. Printing the scores also prove that the used loss truly changed:

    print(history['history']['loss'])
    [599.5209197998047, 570.4041115897043, 3.8622902120862688, 2.1578191178185597]
    print(history['history']['mean_squared_error'])
    [599.5209197998047, 570.4041115897043, 510.29034205845426, 425.32058388846264]
    print(history['history']['mean_squared_logarithmic_error'])
    [8.624503476279122, 6.346359729766846, 3.8622902120862688, 2.1578191178185597]
    

    In the first two epochs the values of loss are equal to ones of mean_square_error and during the third and fourth epochs the values becomes equal to the ones of mean_square_logarithmic_error, which is the new loss that was set. So it seems that using train_on_batch allows to change loss function, nevertheless I want to stress out again that this is basically what one should do on pytoch to achieve the same results, with the difference that the behaviour of pytorch (in this scenario and in my opinion) is more reliable.