python tensorflow machine-learning keras deep-learning

Tensorflow model predicting Nans

I am new to TensorFlow framework and I am trying to apply Tensorflow to predict the survivor based on this Titanic Dataset:https://www.kaggle.com/c/titanic/data.

import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
#%%
titanictrain = pd.read_csv('train.csv')
titanictest = pd.read_csv('test.csv')
df = pd.concat([titanictrain,titanictest],join='outer',keys='PassengerId',sort=False,ignore_index=True).drop(['Name'],1)

#%%

def preprocess(df):
    df['Fare'].fillna(value=df.groupby('Pclass')['Fare'].transform('median'),inplace=True)
    df['Fare'] = df['Fare'].map(lambda x: np.log(x) if x>0 else 0)
    df['Embarked'].fillna(value=df['Embarked'].mode()[0],inplace=True)
    df['CabinAlphabet'] = df['Cabin'].str[0]
    categories_to_one_hot = ['Pclass','Sex','Embarked','CabinAlphabet']
    df = pd.get_dummies(df,columns=categories_to_one_hot,drop_first=True)
    return df
df = preprocess(df)

df = df.drop(['PassengerId','Ticket','Cabin','Survived'],1)

titanic_trainandval = df.iloc[:len(titanictrain)]
titanic_test = df.iloc[len(titanictrain):] #test after preprocessing

titanic_test.head()

#  split train into training and validation set
labels = titanictrain['Survived']
y = labels.values

test = titanic_test.copy() # real test sets
print(len(test), 'test examples')

Here I am trying to apply preprocessing on the data:

1.Drop Name column and Do one hot coding both on the train and test set

2.Drop ['PassengerId','Ticket','Cabin','Survived'] for Simplicity.

Split train and test following the original order

Here is a picture showing what the training set looks like.

"""# model training"""

from tensorflow.keras.layers import Input, Dense, Activation,Dropout
from tensorflow.keras.models import Model

X = titanic_trainandval.copy()
input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(10, activation='relu')(input_layer)
dense_layer_2 = Dense(5, activation='relu')(dense_layer_1)
output = Dense(1, activation='softmax',name = 'predictions')(dense_layer_2)

model = Model(inputs=input_layer, outputs=output)
base_learning_rate = 0.0001

model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate), metrics=['acc'])

history = model.fit(X, y, batch_size=5, epochs=20, verbose=2, validation_split=0.1,shuffle = False)

submission = pd.DataFrame()
submission['PassengerId'] = titanictest['PassengerId']

Then I put the training set X into the model to get the result. However, history shows the following result:

No matter how I change the learning rate and batch size, the result does not change, and the loss is always 'nan', and the prediction based on the test set is always 'nan' as well.

Could anybody explain where the problem is and give some possible solutions?

Solution

at first glance there are 2 major problems in your code:

your output layer must be Dense(2, activation='softmax'). this is because yours is a binary classification problem and if you are using softmax to generate probabilities the output dim must be equal to the number of classes. (you can use one output dimension with sigmoid activation)
you have to change your loss function. with softmax and numerical encoded target use sparse_categorical_crossentropy. (you can use binary_crossentropy with sigmoid and with from_logits=False as default)

PS: make sure to remove all NaNs in your original data before starting fit