keras conv-neural-network large-language-model image-classification

ValueError: Input 0 of layer "sequential_7" is incompatible with the layer: expected shape=(None, 224, 224, 1), found shape=(None, 244, 1)

Introduction :

Created an image classification model to classify whether a hip implant is loose or in control based on the xray/image.
The data is in a csv file with 2 columns (image path and image class) and the csv file is uploaded to a GCS bucket.
The model is trained using images resized to width=244, height=244.

Constraints :

The images are limited hence there is not much data to train the model. This is acceptable as the focus is on making the model work than the accuracy of predictions.

Issue : The expectation is to feed the model with an image and expect a prediction (probability). However, when calling model.predict, the following error is thrown :

"ValueError: Input 0 of layer "sequential_7" is incompatible with the layer: expected shape=(None, 224, 224, 1), found shape=(None, 244, 1)"

Below are the code snippets in the order :

Data Pipeline (reading csv, resizing images)
Create and train model
Prediction using a single image

# 1. DATA PIPELINE

CLASS_NAMES = ['loose', 'control']

def decode_csv(csv_row): # csv_row consists of a file path and the image class
    
    record_defaults = ["path", "image class"] # Default values for the dataset
    filename, label_string = tf.io.decode_csv(csv_row, record_defaults) # tf.io.decode_csv reads every row in the csv
    
    image_bytes = tf.io.read_file(filename=filename) # output: base64 image string
    image_bytes = tf.image.decode_jpeg(image_bytes) # output: an integer array
    image_bytes = tf.image.convert_image_dtype(image_bytes, tf.float32) # output: 0 - 1 range float
    image_bytes = tf.image.resize(image_bytes, [224, 224]) # output: image dimension
    
    label = tf.math.equal(CLASS_NAMES, label_string) # formats label to a boolean array with a truth value corresponding to the output class
    
    return image_bytes, label # Returning a base64 image string and a boolean array with True corresponding to a particular class

def load_dataset(csv_file, batch_size, training=True):
    ds = tf.data.TextLineDataset(filenames=csv_file).skip(1) # skip(1) will remove the top row i.e. header
    ds = ds.map(decode_csv).cache()
    ds = ds.batch(batch_size=batch_size)
    
    if training:
        ds = ds.shuffle(10).repeat()
    return ds

train_ds = load_dataset("gs://qwiklabs-asl-04-06351f77b64f-hip-implant/hip-implant-data.csv", batch_size = 10)

validation_data = load_dataset("gs://qwiklabs-asl-04-06351f77b64f-hip-implant/hip-implant-data.csv", batch_size = 10, training=False)

# 2. CREATE MODEL

IMG_HEIGHT = 224
IMG_WIDTH = 224
IMG_CHANNELS = 64

model = Sequential([
    Conv2D(name="first-Conv2D-layer",filters=64, kernel_size=3, input_shape=(IMG_WIDTH, IMG_HEIGHT, 1), padding='same', activation='relu'),
    MaxPooling2D(name="first-pooling-layer",strides=2, padding='same'),
    Conv2D(name="second-Conv2D-layer", filters=32, kernel_size=3, activation='relu'),
    MaxPooling2D(name="second-pooling-layer", strides=2, padding='same'),
    Flatten(),
    Dense(units=400, activation='relu'),
    Dense(units=100, activation='relu'),
    Dropout(0.25),
    Dense(2),
    Softmax()
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Model Summary

enter image description here

# 3. PREDICTION USING A SINGLE IMAGE:

image_path = tf.io.read_file("gs://qwiklabs-asl-04-06351f77b64f-hip-implant/Control/control (25).png")

new_image = decode_img(image_path, [244, 244])

print(new_image.shape)
plt.imshow(new_image.numpy())

prediction = model.predict(new_image)
print(prediction)

Resolutions already tried :

Tried keeping the padding='same' in Convolutional layers (In response to an initial error which stated mismatch in dimensions within the convolutional layer )
Tried explicitly mentioning the input shape to the model (244,244,1) (adding the layer Input(shape=(244,244,1)))
Tried changing the filter size/ units/ pool size (In response to another error which stated that the layer cannot reduce dimensionality further).

Edit 1 : Missed mentioning the decode_img function which resizes the test image (the single image we are trying to predict with)

img = tf.io.read_file("gs://qwiklabs-asl-04-06351f77b64f-hip-implant/Control/control (25).png")

def decode_img(img, reshape_dims):
    img = tf.image.decode_jpeg(img) # tf.image.decode_jpeg can decode Base64 image string into an integer array
    #print("\n tf.image.decode_jpeg : Convert base64 image string into an integer array \n")
    #print(img)
    img = tf.image.convert_image_dtype(img, tf.float32) # tf.image.convert_image_dtype can cast the integer array into 0 -1 range float
    #print("\n tf.image.convert_image_dtype : Cast the integer array into 0 - 1 range float \n")
    #print(img)
    img = tf.image.resize(img, reshape_dims) # tf.image.resize can make image dimensions consistent for our neural network
    #print("\n tf.image.resize : Keep image dimensions consistent for our neural network \n")
    #print(img)
    return img


img = decode_img(img, [224, 224])

plt.imshow(img.numpy())

Solution

Credit to the user with the resolution : https://www.reddit.com/r/learnmachinelearning/comments/18iaf0b/comment/kdbxjqw/?context=3

"This error is most common when the data you're feeding to a network is the wrong shape. If each sample is size (224, 224, 1), then your input must be rank 4, with shape (n_batch, 224, 224, 1).

If you are trying to test the model with one image, you might accidentally feed it a tensor with shape (224, 224, 1). This is wrong, the correct way to feed in one image is with a shape (1, 224, 224, 1).

You can use numpy.stack, which given several (224, 224, 1) arrays, can combine them alone axis = 0 to a (n_batch, 224, 224, 1) shape."

Here, since the expectation was to feed a single image to get a prediction, I stacked the single image alone using numpy.stack((image,), axis=0) and used it to make a prediction.