Search code examples
pythontensorflowkerasgoogle-colaboratorytf.keras

Why my model always return 0 val loss in Keras Tensorflow when trained on Google Colab?


I'm trying to train an easy model on Colab, but it always returns 0 validation loss when using my own code by!python train.py. However, this code runs perfectly fine on my own computer. Does anyone know the reason?

Epoch 1/500
2020-06-17 19:53:31.689547: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-17 19:53:31.889892: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
47/47 - 7s - loss: 52.6930 - mse: 2876.5457 - mae: 52.5915 - val_loss: 0.1029 - val_mse: 0.0000e+00 - val_mae: 0.0000e+00

The code for training:

    def build_model(self):
        new_model = self.base_model

        opt = Adam(lr=self.lr)
        new_model.compile(loss='mae',
                          optimizer=opt,
                          metrics=['mse', 'mae'])

        return new_model

    def train(self, base_epochs=500,
              save_model=False, save_path=None, cal_time=True):
        model = self.build_model()

        early_stopping = EarlyStopping(monitor='val_loss',
                                       patience=50,
                                       mode='min')
        save_best = ModelCheckpoint(filepath=save_file,
                                    monitor='val_loss',
                                    save_best_only=True,
                                    mode='min')
        cp_callback = [early_stopping, save_best]

        history = model.fit(
            x=self.standardize(self.train_data),
            y=self.train_labels,
            batch_size=self.batch_size,
            epochs=base_epochs,
            verbose=2,
            callbacks=cp_callback,
            validation_data=[self.standardize(self.val_data), self.val_labels],
        )
        return history

I also wrote code to check the image data.

    def check_data(self):
        data_name = ['Train Data', 'Train Labels', 'Validation Data', 'Validation Labels']
        for i, data in enumerate([self.train_data, self.train_labels, self.val_data, self.val_labels]):
            print('{0:<20}:  shape-{1:<20} type--{2}' \
                  .format(data_name[i], str(data.shape), data.dtype))

And here's the information about the data, they're all numpy arrays:

Train Data          :  shape-(3000, 224, 224, 1)  type--float32
Train Labels        :  shape-(3000, 2)            type--float64
Validation Data     :  shape-(200, 224, 224, 1)   type--float32
Validation Labels   :  shape-(200, 2)             type--float64

Solution

  • OK I finally find the problem:

    I'm passing a list to validation_data=, which should be a tuple according to the official website.

    It should be:

    history = model.fit(
        x=self.standardize(self.train_data),
        y=self.train_labels,
        batch_size=self.batch_size,
        epochs=base_epochs,
        verbose=2,
        callbacks=cp_callback,
        validation_data=(self.standardize(self.val_data), self.val_labels),
    )