I've built a binary classifier using TF which classify's a 16x16 gray scale image into one of two classes with distribution 87-13. The issue that I'm having is that the model's log loss converges to ~0.4, which is better than random however I cannot get it to improve beyond this.
The vision problem is in the realm of video encoding, This image should provide some understanding to the problem, where images are are either to be or not to be split (0/1) based on their homogeneity. Note squares near edges are more likely sub-split to smaller ones.
When validating the model (1.1e7 examples, 87-13 distribution), I cannot achieve an F1-score better than ~50%.
My training data consists of 2.2e8 examples which are oversampled/undersampled to achieve 50-50 distribution. I'm using a batch size of 1024 a substantial shuffle buffer (the data isn't ordered to begin with). Optimised using Adam, with default hyperparameters.
Things I've tried to improve the performance (test (outcome)):
I've been stuck trying to get the performance to improve, I think I've read every SO question that I could find. Any advice would be a great help.
def cnn_model(features, labels, mode):
# downsample to 8x8 using 2x2 local averaging
features_8x8 = tf.nn.avg_pool(
value=tf.cast(features["x"], tf.float32),
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding="SAME",
data_format='NHWC'
)
conv2d_0 = tf.layers.conv2d(inputs=features_8x8,
filters=6,
kernel_size=[3, 3],
strides=(1, 1),
activation=tf.nn.relu,
name="conv2d_0")
pool0 = tf.layers.max_pooling2d(
inputs=conv2d_0,
pool_size=(2, 2),
strides=(2, 2),
padding="SAME",
data_format='channels_last'
)
conv2d_1 = tf.layers.conv2d(inputs=pool0,
filters=16,
kernel_size=[3, 3],
strides=(3, 3),
activation=tf.nn.relu,
name="conv2d_1")
reshape1 = tf.reshape(conv2d_1, [-1, 16])
dense0 = tf.layers.dense(inputs=reshape1,
units=10,
activation=tf.nn.relu,
name="dense0")
logits = tf.layers.dense(inputs=dense0,
units=1,
name="logits")
# ########################################################
predictions = {
"classes": tf.round(tf.nn.sigmoid(logits)),
"probabilities": tf.nn.sigmoid(logits)
}
# ########################################################
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode,
predictions=predictions)
# ########################################################
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(
labels=tf.cast(labels['y'], tf.float32),
logits=logits
)
loss = tf.reduce_mean(cross_entropy)
# ########################################################
# Configure the Training Op (for TRAIN mdoe)
if mode == tf.estimator.ModeKeys.TRAIN:
optimiser = tf.train.AdamOptimizer(learning_rate=0.001,
beta1=0.9,
beta2=0.999,
epsilon=1e-08)
train_op = optimiser.minimize(
loss=loss,
global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode,
loss=loss,
train_op=train_op)
# Add evalutation metrics (for EVAL mode)
eval_metric_ops = {
"accuracy": tf.metrics.accuracy(
labels=labels["y"],
predictions=predictions["classes"]),
}
return tf.estimator.EstimatorSpec(mode=mode,
loss=loss,
eval_metric_ops=eval_metric_ops)
It seems that you have done a lot already. My next steps would be visualization of
Possibly, you are asking for a very difficult vision problem. Can we see the images or get a sample of the data? Then, experienced people could try to come up with a basic model that is (hopefully) working...