Search code examples
kerasdeep-learningneural-networkclassificationrecommendation-engine

Test predictions are all the same


I'm trying to build a Keras model with binary outputs for a recommendation task.

After I build it and train it, it seems to be converging and improving both training and validation accuracy:

EPOCH:  0
Train on 4641920 samples, validate on 1160480 samples
Epoch 1/1
4641920/4641920 [==============================] - 93s 20us/step - loss: 0.0317 - val_loss: 0.0262
TRAIN F1:  0.16175450762829402
VAL F1:  0.09613703897919944
EPOCH:  1
Train on 4641920 samples, validate on 1160480 samples
Epoch 1/1
4641920/4641920 [==============================] - 100s 22us/step - loss: 0.0238 - val_loss: 0.0256
TRAIN F1:  0.2667970500753779
VAL F1:  0.1608853650479022

But when I try to predict the values in the test set, it outputs only 0s (there seems to be no such problem with validation set however):

val_prediction = model.predict(x=[val_customer_id, val_vendor_id], verbose=1, batch_size=384)
print(np.unique(val_prediction.round()))

1160480/1160480 [==============================] - 6s 5us/step
    [0. 1.]



val_prediction = model.predict(x=[test_customer_id, test_vendor_id], verbose=1, batch_size=384)
print(np.unique(val_prediction.round()))

1672000/1672000 [==============================] - 8s 5us/step
[0.]

I'm really struggling here and it would be really helpful if someone could help me.


Solution

  • There are two possible reasons:

    1- If your train and test sets are too sparse, then you will get a bad accuracy on the test set

    2- Your model is overfitting.

    Answer 1:

    If the dataset is too sparse, you need to rearrange them. Try adding some validation values to test, see if the result is changing.

    Answer 2:

    If your model is overfitting:

    a- Adding new layers

    b- Dropout

    c- Increasing batch size

    may solve your problems.