Search code examples
pythonnumpymachine-learninggenerative-adversarial-network

Convert PNG or JPEG images to the format accepted by GAN algorithm


I am new to this field for GAN and I tried a few tutorials, however, most of it used either Cifar or mnist datasets. So mostly are build in format shape like this (xxxx, 28, 28).

Recently, I wanted to try our other picture. For example,

from scipy import misc
data = misc.imread("1.PNG") #this can be any images from JPEG or any
print(data.shape)

My output:

(842, 1116, 4) # Seriously I dont understand what does this mean. 842 means 842 files? I thought I have only 1 image loaded.

My expected output:

Since I am new, I really wanted to open to question about whether should it be (1, 28, 28) or something else? So that i could be fit into GAN since it used 784 in the tutorial

Normally, in the dataset for mnist we have (60000, 28, 28) which means 60k pictures and each of shape 28x28. What about my above output? (842, 1116, 4) does not mean 842 pictures with shape 1116 x 4 isnt it? I only loaded one image. Can someone assist me in how to convert it and also understand it. thank you


Solution

  • imread uses PIL or Pillow to read images, it returns the images in the format: height x width x channels where channels is usually 3 channels (red, green, blue [RGB] of a normal color image) or sometimes 4 channels (red, green, blue, alpha/transparency [RGBA]).

    So you read an image of size 842x1116 pixels with 4 color channels. You say you use training data of the shape (xxxx, 28, 28), so you use grayscale not color images. First step would be to convert the color image to grayscale. Pillow (as replacement for PIL) is a nice library for image operations. Alternatively you can just use one channel

    gray_data = data[:,:,0]
    

    To use it as training data you can now either resize it to 28x28 or extract small patches of size 28x28 from it.

    small_data = gray_data[:28,:28]
    

    This will lead to (28,28). Most learning algorithms expect not one but several images, mostly in the format (#images, height, width). So you need to reshape it:

    final_data = small_data.reshape(1,28,28)
    

    That should do it. However, proper RGB(A) to Gray conversion and resizing will be the better solution. Check Pillow documentation for details.