I've been working on a neural network that can classify two sets of astronomical data. I believe that my neural network is struggling because the two sets of data are quite similar, but even with significant changes to the data, it still seems like the accuracy history doesn't behave how I think it would.
These are example images from each dataset:
I'm currently using 10,000 images of each type, with 20% going to validation data, so 16,000 training images and 4,000 validation images. Due to memory constraints, I can't increase the datasets much more than this.
This is my current model:
model.add(layers.Conv2D(64, (3, 3), padding="valid", activation='relu', input_shape=(192, 192, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (7, 7), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (9, 9), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (7, 7), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(2, activation="sigmoid"))
which I'm compiling with:
opt = SGD(lr=0.1)
model.compile(optimizer=opt,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
and fitting using:
history = model.fit(train, train_labels, batch_size=200, epochs=15, validation_data=(validation, validation_labels))
If I add something to the data to make the datasets unrealistically different (e.g. adding a random rectangle to the middle of the data or by adding a mask to one but not the other), I get an accuracy history that looks like this:
(Note that the accuracy history for the training data was shifted half an epoch to the left to account for the training accuracy being measured, on average, half an epoch before the validation accuracy.)
If I make the datasets very similar (e.g. adding nothing to the datasets or applying the same mask to both), I get an accuracy history that looks like this:
or occasionally with a big spike in validation accuracy for one epoch, like so:
Looking at different websites and other StackOverflow pages, I've tried:
Am I missing something? Are these dataset just too similar to achieve a binary classifcation network?
If this is a binary classification then you need to change:
model.add(layers.Dense(2, activation="sigmoid"))
into:
model.add(layers.Dense(1, activation="sigmoid"))
Sigmoid indicates that if the output is bigger than some threshold(most of the times 0.5) then it belongs to second class etc. Also you really don't need to use from_logits = True
since you specified an activation in the last dense layer.
Recall that your loss also should be:
tf.keras.losses.BinaryCrossentropy(from_logits = False)
If you want to set from_logits = True
, then your last dense layer should look like this:
model.add(layers.Dense(1)) # no activation, linear.
You can also use 2 neurons in the last dense layer but then you need to use softmax
activation with categorical loss.