I am working on a neural network that predicts heart disease. The data comes from kaggle and has been pre-processed. I have used various models, such as logistic regression, random forests, and SVM, which all produce solid results. I'm trying to use the same data for a neural network, to see whether a NN can outperform the other ML models (the data set is rather small, which may explain the poor results). Below is my code for the network. The model below produces 50% accuracy, which, obviously, is too low to be useful. From what you can tell, does anything look off that would undermine the accuracy of the model?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.layers import Dense, Dropout
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import EarlyStopping
df = pd.read_csv(r"C:\Users\***\Desktop\heart.csv")
X = df[['age','sex','cp','trestbps','chol','fbs','restecg','thalach']].values
y = df['target'].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit_transform(X_train)
scaler.transform(X_test)
nn = tf.keras.Sequential()
nn.add(Dense(30, activation='relu'))
nn.add(Dropout(0.2))
nn.add(Dense(15, activation='relu'))
nn.add(Dropout(0.2))
nn.add(Dense(1, activation='sigmoid'))
nn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics=
['accuracy'])
early_stop = EarlyStopping(monitor='val_loss',mode='min', verbose=1,
patience=25)
nn.fit(X_train, y_train, epochs = 1000, validation_data=(X_test, y_test),
callbacks=[early_stop])
model_loss = pd.DataFrame(nn.history.history)
model_loss.plot()
predictions = nn.predict_classes(X_test)
from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(y_test,predictions))
print(confusion_matrix(y_test,predictions))
The scaler is not in-place; you need to save the scaled results.
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
You'll then get results more in line with what you were expecting.
precision recall f1-score support
0 0.93 0.98 0.95 144
1 0.98 0.93 0.96 164
accuracy 0.95 308
macro avg 0.95 0.96 0.95 308
weighted avg 0.96 0.95 0.95 308