Search code examples
pythonkeraspytorchautoencodermnist

Why do we need to pre-process image datasets?


Refer to this Complete guide on How to use Autoencoders in Python

Notice the author add:

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

after they loaded the MNIST data.

Why do they divide the image data by 255? And why 255? After that why do they reshape a 2d matrix into 1d?

Thank you so much!


Solution

    • Why dividing by 255:
      The RGB value is of values up to 255 and you want to standardize your colors between 0 and 1.

    Then why the transformation to a 1D vector is to easily send the whole vector into a model. If you have a 2D vector you will have to use other forms of input layers or different kinds of models which are built especially for this. In many cases a 2D vector can be indicative of timeseries datasets which I actually do not know if there are CNN implementations which may use 2D inputs for images.