python tensorflow machine-learning structured-data

Why is my loss trending down while my accuracy is going to zero?

I am trying to practice my machine learning skills with Tensorflow/Keras but I am having trouble around fitting the model. Let me explain what I've done and where I'm at.

I am using the dataset from Kaggle's Costa Rican Household Poverty Level Prediction Challenge

Since I am just trying to get familiar with the Tensorflow workflow, I cleaned the dataset by removing a few columns that had a lot of missing data and then filled in the other columns with their mean. So there are no missing values in my dataset.

Next I loaded the new, cleaned, csv in using make_csv_dataset from TF.

batch_size = 32

train_dataset = tf.data.experimental.make_csv_dataset(
    'clean_train.csv',
    batch_size,
    column_names=column_names,
    label_name=label_name,
    num_epochs=1)

I set up a function to return my compiled model like so:

f1_macro = tfa.metrics.F1Score(num_classes=4, average='macro')

def get_compiled_model():
    model = tf.keras.Sequential([
      tf.keras.layers.Dense(512, activation=tf.nn.relu, input_shape=(137,)),  # input shape required
      tf.keras.layers.Dense(256, activation=tf.nn.relu),
      tf.keras.layers.Dense(4, activation=tf.nn.softmax)
    ])

    model.compile(optimizer='adam',
                loss='binary_crossentropy',
                metrics=[f1_macro, 'accuracy'])
    return model

model = get_compiled_model()
model.fit(train_dataset, epochs=15)

Below is the result of that

A link to my notebook is Here

I should mention that I strongly based my implementation on Tensorflow's iris data walkthrough

Thank you!

Solution

After a while, I was able to find the issues with your code they are in the order of importance. (First is of highest importance)

You are doing multi-class classification (not binary classification). Therefore your loss should be categorical_crossentropy.
You are not onehot encoding your labels. Using binary_crossentropy and having labels as a numerical ID is definitely not the way forward. Instead, you should do onehot encode your labels and solve this like a multi-class classification problem. Here's how you do that.

def pack_features_vector(features, labels):
    """Pack the features into a single array."""
    features = tf.stack(list(features.values()), axis=1)
    return features, tf.one_hot(tf.cast(labels-1, tf.int32), depth=4)

Normalizing your data. If you look at your training data. They are not normalized. And their values are all over the place. Therefore, you should consider normalizing your data by doing something like below. This is just for demonstration purposes. You should read about Scalers in scikit learn and choose what's best for you.

x = train_df[feature_names].values #returns a numpy array
min_max_scaler = preprocessing.StandardScaler()
x_scaled = min_max_scaler.fit_transform(x)
train_df = pd.DataFrame(x_scaled)

These issues should set your model straight.