python tensorflow keras conv-neural-network

Design keras model between two image layer

I would like to define a Keras Sequntial model between image pairs (9000 images), where the input shape is (5,5,3) since its an 5x5 pixel rgb image, and the output shape is (5,8) where the image only have one band information. The input image set contains satellite images, the output contains canopy heights per pixel information.

The input image set is normalized to 0 - 1 by dividing by 255. The output isn't.

It's a kind of regression problem, where I would like to find the connection between the derived product, and the original input - like reverse engineering.

Could you advise any or some basic code model, which could fit to problem space?

What kind of loss of function shall I use in for mentioned case:

#model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
#model.compile(loss='mse', optimizer='adam', metrics=['mse', 'mae', 'mape'])

Shall I normalize the output set as well?

My current code:

import os
import numpy as np
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Conv2D, Flatten, UpSampling2D, Dense
from PIL import Image


X = []
Y =[]

train_folder = 'D:/satellite/data/train/'
files = os.listdir(train_folder)
for file in files:
    if file.endswith('.png'):
        img = Image.open(train_folder + file)
        X.append(np.array(img))


X=np.array(X)
X=X/255

train_folder_y = 'D:/canopy/data/train/'
files = os.listdir(train_folder_y)
for file in files:
    if file.endswith('.png'):
        img = Image.open(train_folder_y + file)
        Y.append(np.array(img))

Y=np.array(Y)


x_train, x_test, y_train, y_test = train_test_split(X,Y, test_size=0.25)

Image sample (2496.png):

Image sample (159.png):

Note: Using .png since I don't want to lose information by jpeg compressions.

Solution

Base code model proposition

from PIL import Image
import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, Flatten, UpSampling2D, Dense

# Load the first image
image1 = Image.open("image1.png")
# Convert the image to a numpy array
image1_array = np.array(image1)
# print(image1_array.shape) # (5, 5, 3)

# Load the second image (ground truth)
image2 = Image.open("image2.png")
# Convert the image to a numpy array
image2_array = np.array(image2)
image2_array = image2_array.flatten()

# print(image2_array.shape) # (5, 8)

# Create your Keras model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=image1_array.shape))
# Add more layers as needed based on your model architecture
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(3, (3, 3), activation='sigmoid', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(3, (3, 3), activation='sigmoid', padding='same'))
model.add(Flatten())  # Flatten the output
model.add(Dense(40, activation='relu'))  # Add a dense layer with 40 units

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Reshape the input arrays
image1_array = image1_array.reshape((1,) + image1_array.shape)
image2_array = image2_array.reshape((1,) + image2_array.shape)

# Train the model using image1 as input and image2 as the ground truth
model.fit(image1_array, image2_array, epochs=1000, batch_size=1)

When running it, it converges around 164.4250 you will need to play around with training parameters.

Epoch 1000/1000
1/1 [==============================] - 0s 2ms/step - loss: 164.4250

Next steps can be to increase the number of filters (I added a bottleneck in the architecture, that definitely hurts currently), or you could decrease the learning rate as we get caught in a local optimum pretty soon in the current setup.