python tensorflow machine-learning jupyter-notebook training-data

Memory error while training my model: Unable to allocate 31.9 GiB for an array with shape (3094, 720, 1280, 3) and data type float32

So, I am providing labels to my images as "0" and "1" based on the presence of a human. When I pass all my images and try to train my model. I get a memory error.

    import warnings
    warnings.filterwarnings('ignore')
    import tensorflow as to
    import tensorflow.keras
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    from tensorflow.keras.callbacks import ReduceLROnPlateau, CSVLogger, EarlyStopping
    from tensorflow.keras.models import Model
    from tensorflow.keras.layers import GlobalAveragePooling2D, Dense
    from tensorflow.keras.applications.resnet50 import ResNet50


    from PIL import Image
    import os
    import numpy as np
    train_x=[]
    train_y=[]
    for path in os.listdir('C:\\Users\\maini_\\Desktop\\TestAndTrain\\in\\train'):
        img = Image.open('C:\\Users\\maini_\\Desktop\\TestAndTrain\\in\\train\\'+path)
        train_x.append(np.array(img))
        train_y.append(1)
        img.close()
    for path in os.listdir('C:\\Users\\maini_\\Desktop\\TestAndTrain\\notin\\train'):
        img = Image.open('C:\\Users\\maini_\\Desktop\\TestAndTrain\\notin\\train\\'+path)
        train_x.append(np.array(img))
        train_y.append(0)
        img.close()
    print("done" )

   train_x = np.array(train_x)
   train_x = train_x.astype(np.float32)
   train_x /= 255.0

   train_y = np.array(train_y)

I am working with

the Jupyter notebook version:6.0.3
python version: 3.7
Anaconda version: 4.8.3

Solution

You've tried to pass 3094 images of size 720x1280 into your model as one singular batch, resulting in a total 31.9GB worth of data. Your GPU is overloaded and cannot physically store and process all that data at one time, you need to use batches.

Since you will run into trouble every time you try to process the data, I recommend using ImageDataGenerator() and flow_from_directory() which will load the pictures for training automagically.

An ideal way to set this up is as follows

train_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.3)   #Splits the data 70/30 for training and validation

train_generator = train_datagen.flow_from_directory(
        train_data_dir,
        color_mode='grayscale', 
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode='categorical',
        shuffle=True,
        subset='training')

validation_generator = train_datagen.flow_from_directory(
        train_data_dir,  
        color_mode='grayscale',
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode='categorical',
        shuffle=True,
        subset='validation')

Then to fit the model you would need to call the model.fit_generator() class

model.fit_generator(train_generator, epochs=epochs, callbacks=callbacks, validation_data=validation_generator)

This is the best way to deal with mass amounts of images when training models in Keras as the data is generated (or flowed) from the directory per batch, rather than manually loading and whatnot in Python. The only caveat is the directory setup is slightly different to what you currently have. You will need to change the directory setup to

TestAndTrain
 -Train
   -in
   -notin
 -Test
   -in
   -notin