python tensorflow machine-learning keras deep-learning

High accuracy during training and validation, low accuracy during prediction with the same dataset

So I'm trying to train Keras model. There is high accuracy (I'm using f1score, but accuracy is also high) while training and validating. But when I'm trying to predict some dataset I'm getting lower accuracy. Even if I predict training set. So I guess it's not about overfitting problem. What then is the problem?

import matplotlib.pyplot as plt

skf = StratifiedKFold(n_splits=5)
for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    X_train,x_val,y_train,y_val = train_test_split(X_train, y_train, test_size=0.5,stratify = y_train)
    y_train = encode(y_train)
    y_val = encode(y_val)
    
    model = Sequential()
    model.add(Dense(50,input_dim=X_train.shape[1],activation='tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(25,activation='tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(10,activation='tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(2, activation='softmax'))   
    
    opt = Adam(learning_rate=0.001)
    model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['acc', ta.utils.metrics.f1score])  
    history = model.fit(X_train, y_train, 
                        validation_data=(x_val, y_val),
                        epochs=5000,
                        verbose=0)
    
    plt.plot(history.history['f1score'])
    plt.plot(history.history['val_f1score'])
    plt.title('model accuracy')
    plt.ylabel('f1score')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    break

The result is here. As you can see results high at training and validation set.

And code for predict:

from sklearn.metrics import f1_score

y_pred = model.predict(x_train)
y_pred = decode(y_pred)
y_train_t = decode(y_train)
print(f1_score(y_train_t, y_pred))

The result is 0.64, that is less than expected 0.9.

My decode and encode:

def encode(y):
    Y=np.zeros((y.shape[0],2))
    for i in range(len(y)):
        if y[i]==1:
            Y[i][1]=1
        else :
            Y[i][0]=1
    return Y

def decode(y):
    Y=np.zeros((y.shape[0]))
    for i in range(len(y)):
        if np.argmax(y[i])==1:
            Y[i]=1
        else :
            Y[i]=0
    return Y

Solution

Since you use a last layer of

model.add(Dense(2, activation='softmax')

you should not use loss='binary_crossentropy' in model.compile(), but loss='categorical_crossentropy' instead.

Due to this mistake, the results shown during model fitting are probably wrong - the results returned by sklearn's f1_score are the real ones.

Irrelevant to your question (as I guess the follow-up one will be how to improve it?), we practically never use activation='tanh' for the hidden layers (try relu instead). Also, dropout should not be used by default (especially with such a high value of 0.5); comment-out all dropout layers and only add them back if your model overfits (using dropout when it is not needed is known to hurt performance).