Search code examples
pythonkerasconv-neural-networkautoencoder

Keras Convolutional Autoencoder blank output


Quick disclaimer: I'm pretty new to Keras, machine learning, and programming in general.

I'm trying to create a basic autoencoder for (currently) a single image. While it seems to run just fine, the output is just a white image. Here's what I've got:

img_height, img_width = 128, 128

input_img = '4.jpg'
output_img = '5.jpg'

# load image
x = load_img(input_img)
x = img_to_array(x)  # array with shape (128, 128, 3)
x = x.reshape((1,) + x.shape)  # array with shape (1, 128, 128, 3)

# define input shape
input_shape = (img_height, img_width, 3)

model = Sequential()
# encoding
model.add(Conv2D(128, (3, 3), activation='relu', input_shape=input_shape, 
padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))

# decoding
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D(size=(2,2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D(size=(2,2)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(3, (3, 3), activation='sigmoid', padding='same'))

model.compile(loss='binary_crossentropy', optimizer='adam')
print(model.summary())

checkpoint = ModelCheckpoint("autoencoder-loss-{loss:.4f}.hdf5", monitor='loss', verbose=0, save_best_only=True, mode='min') 
model.fit(x, x, epochs=10, batch_size=1, verbose=1, callbacks=[checkpoint])

y = model.predict(x)

y = y[0, :, :, :]
y = array_to_img(y)
save_img(output_img, y)

I've looked at a handful of tutorials for reference, but I still can't figure out what my issue is.

Any guidance/suggestions/help would be greatly appreciated.

Thanks!


Solution

  • this solved the problem. The code was just missing

    x = x.astype('float32') / 255.
    

    This is a numpy built-in function to convert the values contained in that vector to floats.

    This allows us to get decimal values, where the values are divided by 255. RGB values are stored as 8 bit integers, so we divide the values in the vector by 255 (2^8 - 1), to represent the colour as a decimal value between 0.0 and 1.0.