python tensorflow keras neural-network loss-function

Does Google Colab have its own random seed?

Does Google Colab have its own random seed that has nothing to do with TensorFlow or Numpy?

I used the set_random_seed() function of Keras Utils and unify by 777 to the seed of the Numpy, TensorFlow, and Python random library just in case.

However, each time I run the code, the results of the loss value and accuracy of the neural network are different.

Here's the tensorflow neural network code I wrote and the results of 2 times running.

import numpy as np
import glob
import keras
import pickle
import time
import tensorflow as tf
from tensorflow import GradientTape
from keras.losses import BinaryCrossentropy
from keras.metrics import BinaryAccuracy
from keras.optimizers import Adam
from keras.models import Sequential
from keras.layers import Dense, GRU, Convolution2D, Flatten, TimeDistributed, MaxPool2D, Bidirectional
import random

random.seed(777)
tf.random.set_seed(777)
np.random.seed(777)

tf.keras.utils.set_random_seed(777)

class Trainer:
    def __init__(self, model, epochs):
        self.model = model
        self.epochs = epochs

    def getTrainData(self, filePath):
        data = np.load(filePath)

        label = data["label"]
        TTMelSpectrogram = data["TTMelSpectrogram"]
        
        length, height, width = TTMelSpectrogram.shape

        TTMelSpectrogram = TTMelSpectrogram.reshape((1, length, height, width, 1))
        label = label.reshape((1,-1))

        return label, TTMelSpectrogram

    def train(self, dataFilePaths, optimizer, loss_fn, train_metric, epochs_per_log=1):
        print("Start Training")
        start_time = time.time()
        for epoch in range(self.epochs):
            for step, dataFilePath in enumerate(dataFilePaths):
                label, TTMelSpectrogram = self.getTrainData(dataFilePath)
                
                with GradientTape() as tape:
                    logits = self.model(TTMelSpectrogram, training=True)
                    loss_value = loss_fn(label, logits)

                grads = tape.gradient(loss_value, self.model.trainable_weights)
                optimizer.apply_gradients(zip(grads, self.model.trainable_weights))
                train_metric.update_state(label, logits)
                
                if step == (len(dataFilePaths)-1) and epoch % epochs_per_log == 0:
                    train_acc = train_acc_metric.result()
                    takenTime = round(time.time() - start_time, 1)
                    start_time = time.time()
                    print("Epoch: {0}/{1} - Loss: {2} / Accuracy: {3} - Time taken: {4}s".format(epoch, self.epochs, float(loss_value),float(train_acc), takenTime))

raw_path = "/content/drive/MyDrive/Music Tagger Data/DataSet/*.npz"
data_filePaths = glob.glob(raw_path)

melSpectrogramShape = (None, 32, 489, 1)

model = Sequential()

model.add(TimeDistributed(Convolution2D(128, (5, 5), activation='relu'), input_shape=melSpectrogramShape)) #, kernel_initializer=tf.keras.initializers.HeNormal()
model.add(TimeDistributed(MaxPool2D(pool_size=(2,2))))

model.add(TimeDistributed(Convolution2D(128, (3, 3), activation='relu')))
model.add(TimeDistributed(MaxPool2D(pool_size=(2,2))))

model.add(TimeDistributed(Convolution2D(64, (2, 2), activation='relu')))
model.add(TimeDistributed(MaxPool2D(pool_size=(2,2))))

model.add(TimeDistributed(Flatten()))

model.add(TimeDistributed(Dense(128, activation='relu')))

model.add(Bidirectional(GRU(128, activation='tanh', return_sequences=True)))
model.add(Bidirectional(GRU(64, activation='tanh')))

model.add(Dense(moodAmount, activation='sigmoid'))

model.summary()

optimizer = Adam()
train_acc_metric = BinaryAccuracy()
loss_function = BinaryCrossentropy()

PredictionLog() #A function that outputs predicted accuracy and predicted answers for each x,y data. The internal code of the function was omitted for fear of prolonged writing.
print()

trainer = Trainer(model=model,
                  epochs=30)

model.compile(optimizer='adam', loss='binary_crossentropy')

with tf.device('/device:GPU:0'):
  trainer.train(dataFilePaths=data_filePaths,
                loss_fn=loss_function,
                optimizer=optimizer,
                train_metric=train_acc_metric,
                epochs_per_log=5)

print()
PredictionLog()

model.save("/content/drive/MyDrive/Music Tagger Data/model D46.h5")

First Time Running:

Epoch: 0/30 - Loss: 0.4456087350845337 / Accuracy: 0.7730434536933899 - Time taken: 12.6s

Epoch: 5/30 - Loss: 0.45313453674316406 / Accuracy: 0.7933333516120911 - Time taken: 24.0s

Epoch: 10/30 - Loss: 0.46146297454833984 / Accuracy: 0.7961264848709106 - Time taken: 23.8s

Epoch: 15/30 - Loss: 0.41201579570770264 / Accuracy: 0.8028261065483093 - Time taken: 24.7s

Epoch: 20/30 - Loss: 0.1334078013896942 / Accuracy: 0.827163577079773 - Time taken: 23.9s

Epoch: 25/30 - Loss: 0.0628323033452034 / Accuracy: 0.8586287498474121 - Time taken: 23.6s

Second Time Running

Epoch: 0/30 - Loss: 0.4462859332561493 / Accuracy: 0.7721739411354065 - Time taken: 13.7s

Epoch: 5/30 - Loss: 0.4787677824497223 / Accuracy: 0.7933333516120911 - Time taken: 23.9s

Epoch: 10/30 - Loss: 0.4732694923877716 / Accuracy: 0.7980237007141113 - Time taken: 24.3s

Epoch: 15/30 - Loss: 0.47321563959121704 / Accuracy: 0.7993478178977966 - Time taken: 24.1s

Epoch: 20/30 - Loss: 0.4715110659599304 / Accuracy: 0.8061283826828003 - Time taken: 25.1s

Epoch: 25/30 - Loss: 0.22313082218170166 / Accuracy: 0.8229765892028809 - Time taken: 23.9s

The loss value of the last Epochs is significantly different. Fortunately, there was not much difference in accuracy in these two attempts, but sometimes the loss value goes up to 0.6, giving all the predictions to zero.

Solution

The training status would not always be the same (always varies to some extent but not extreme) in different runs. It does not essentially depend on the random seed as there are other variables/factors involved. As you mentioned already, your model's accuracy on both runs are not very different and they are almost following a similar trend.

As for the loss value, the fluctuations/instability you are observing in different runs could be due to a number of reasons like the following:

Data is quite imbalanced
Inadequate data compared to the complexity of the model
Distributions of the train and test set are significantly different
Random poor initialization of model weights

You also mentioned that the loss value often goes up to 0.6, I'm guessing you are getting the high loss values nearing the end of your training. It is most likely happening due to overfitting and to overcome this you can try adding some regularizations (e.g dropout layers).