I am trying to build an image classification model using an Inception Network as the base. This is a simple binary classification model.
My images are available in many smaller directories within one big directory. Each of them has its own 'image id' and that is how they have been named. In addition to this, I have a few tsv files which contain these image ids and the respective labels ('Positive' or 'Negative').
When I train the model, I see that my accuracy fluctuates without much progress. I was wondering if there is anything wrong with the way that I have prepared my dataset. I have written a few functions for this purpose.
Before I get to these functions, given below is how I have defined my model,
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dropout(0.4)(x)
predictions = Dense(2, activation='sigmoid')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
These are the functions that I have written in order to prepare my data,
def vectorize_img(img_path):
img = load_img(img_path, target_size=(224, 224)) #size is 224,224 by default
x = img_to_array(img) #change to np array
x = preprocess_input(x) #make input confirm to InceptionV3 input format
return x
def prepare_features(base_dir, limit):
features_dict = dict()
for dir1 in os.listdir(base_dir):
for dir2 in os.listdir(base_dir + dir1):
for file in os.listdir(base_dir + dir1 + '/' + dir2):
if len(features_dict) < limit:
try:
img_path = base_dir + dir1 + '/' + dir2 + '/' + file
x = vectorize_img(img_path)
name_id = file.split('.')[0] #take the file name and use as id in dict
features_dict[name_id] = x
except Exception as e:
print(e)
pass
return features_dict
def prepare_data(file_path, features_dict):
inputs = []
labels = []
df = pd.read_csv(file_path, sep='\t')
df = df[['image_id', 'label_text_image']]
df['class'] = df[['image_id', 'label_text_image']].apply(lambda x: 1 if x['label_text_image'] == 'Positive' else 0, axis = 1)
for index, row in df.iterrows():
try:
inputs.append(features_dict[row['image_id']])
labels.append(row['class'])
except:
pass
return np.asarray(inputs), tf.one_hot(np.asarray(labels), depth=2)
These functions are then called to prepare my dataset,
features_dict = prepare_features('/path/to/img/dir', 8000)
x_train, y_train = prepare_data('/path/to/train/tsv', features_dict)
x_dev, y_dev = prepare_data('/path/to/dev/tsv', features_dict)
x_test, y_test = prepare_data('/path/to/test/tsv', features_dict)
Finally, the model is trained,
EPOCHS = 50
BATCH_SIZE = 32
STEPS_PER_EPOCH = 1
history = model.fit(x=x_train, y=y_train, validation_data=(x_dev, y_dev), epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH, batch_size=BATCH_SIZE)
model.evaluate(x=x_test, y=y_test, batch_size=BATCH_SIZE)
Am I doing something wrong?
Here are the results that my model achieves,
Epoch 1/50
1/1 [==============================] - 158s 158s/step - loss: 0.8298 - accuracy: 0.5000 - val_loss: 0.7432 - val_accuracy: 0.5227
Epoch 2/50
1/1 [==============================] - 113s 113s/step - loss: 0.7775 - accuracy: 0.4688 - val_loss: 0.8225 - val_accuracy: 0.5153
Epoch 3/50
1/1 [==============================] - 113s 113s/step - loss: 0.7663 - accuracy: 0.5625 - val_loss: 0.8431 - val_accuracy: 0.5174
Epoch 4/50
1/1 [==============================] - 156s 156s/step - loss: 1.1292 - accuracy: 0.5312 - val_loss: 0.7763 - val_accuracy: 0.5227
Epoch 5/50
1/1 [==============================] - 114s 114s/step - loss: 0.7452 - accuracy: 0.5312 - val_loss: 0.7332 - val_accuracy: 0.5448
Epoch 6/50
1/1 [==============================] - 156s 156s/step - loss: 0.7884 - accuracy: 0.5312 - val_loss: 0.7072 - val_accuracy: 0.5606
Epoch 7/50
1/1 [==============================] - 114s 114s/step - loss: 0.7856 - accuracy: 0.5312 - val_loss: 0.7195 - val_accuracy: 0.5764
Epoch 8/50
1/1 [==============================] - 156s 156s/step - loss: 0.9203 - accuracy: 0.5312 - val_loss: 0.7348 - val_accuracy: 0.5616
Epoch 9/50
1/1 [==============================] - 156s 156s/step - loss: 0.8639 - accuracy: 0.4062 - val_loss: 0.7275 - val_accuracy: 0.5690
Epoch 10/50
1/1 [==============================] - 156s 156s/step - loss: 0.6170 - accuracy: 0.7188 - val_loss: 0.7125 - val_accuracy: 0.5880
Epoch 11/50
1/1 [==============================] - 156s 156s/step - loss: 0.5756 - accuracy: 0.7188 - val_loss: 0.6979 - val_accuracy: 0.6017
Epoch 12/50
1/1 [==============================] - 113s 113s/step - loss: 0.9976 - accuracy: 0.4375 - val_loss: 0.6834 - val_accuracy: 0.5933
Epoch 13/50
1/1 [==============================] - 156s 156s/step - loss: 0.7025 - accuracy: 0.5938 - val_loss: 0.6863 - val_accuracy: 0.5838
You mentioned that it is a binary classification hence labels are {0,1}
. In this case your model output should either be
predictions = Dense(2, activation='softmax')(x)
with categorical labels [0,1] or [1,0]
or
predictions = Dense(1, activation='sigmoid')(x)
with binary label 1 or 0
but you are using output 2 with sigmoid
i.e. predictions = Dense(2, activation='sigmoid')(x)
.