Why is ImageDataGenerator() performing poorly?

I am trying to build a image classification model, using ImageDataGenerator(). It seems that the model trains and performs poorly. The training loss stays at around 15 and the accuracy is barely 10%, the validation is about the same.

Just to see what would happen, I tried training without using the ImageDataGenerator() and set up the data in a similar way. It performed much better in training, validation and testing. With training loss of 0.71 and accuracy of 75% and validation loss of 0.8 and accuracy of 72%.

I need to figure out this model with the data generator because I will be moving on to a larger dataset, where it will not fit into memory.

So, I guess my question is what am I doing wrong with the ImageDataGenerator() that it is performing so badly and how can I improve the outcome?

When setting up the files (in all Train, Test, Validation folders), there are the classes with its own folder and in those folders is where the images are.

Here is the code:

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import pickle
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten, Conv2D, MaxPooling2D, Dropout

data_gen = ImageDataGenerator()
IMG_SIZE = 100
train_it = data_gen.flow_from_directory('D:/.../Train/', class_mode='sparse',
                                       target_size=(IMG_SIZE, IMG_SIZE),color_mode='grayscale', shuffle=True,batch_size=32)
val_it = data_gen.flow_from_directory('D:/.../Validation/', class_mode='sparse',
                                     target_size=(IMG_SIZE, IMG_SIZE),color_mode='grayscale', shuffle=True,batch_size=32)

IMAGE_SIZE = [100, 100]

model=Sequential()
model.add(Conv2D(32,(3,3), input_shape=[*IMAGE_SIZE, 1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.5))

model.add(Conv2D(32,(3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.5))

model.add(Conv2D(32,(3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(len(train_it.class_indices), activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit_generator(train_it, epochs=20, validation_data=val_it )

Here is my code without ImageDataGenerator(): SETUP the data, using OpenCV

DATADIR='D:\...\Train'
CATEGORIES = pickle.load(open("CATEGORIES.p" , "rb"))
print(len(CATEGORIES))
IMG_SIZE = 100
training_data=[]

def create_training_data():
    for category in CATEGORIES:
        path = os.path.join(DATADIR,category)
        class_num = CATEGORIES.index(category)
        for img in os.listdir(path):
            try:
                img_array = cv2.imread(os.path.join(path,img),cv2.IMREAD_GRAYSCALE)
                new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
                training_data.append([new_array, class_num])
            except:
                print(category)
                print(img)

create_training_data()

random.shuffle(training_data)

X=[]
y=[]
for features, label in training_data:
    X.append(features)
    y.append(label)

X=np.array(X).reshape(-1,IMG_SIZE, IMG_SIZE, 1)
X=X/255.0

MODEL SETUP:

model=Sequential()
model.add(Conv2D(32,(3,3), input_shape=[*IMAGE_SIZE, 1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.5))

model.add(Conv2D(32,(3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.5))

model.add(Conv2D(32,(3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(len(CATEGORIES), activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X,y, epochs=20, batch_size=32, validation_split=0.1)

Solution

@acho,

Mentioning the solution to this issue cited by you in the comments, for the benefit of the community.

Reason for the issue is that Input Data is not Normalized by dividing each Pixel Value by 255. It has an impact on Training because of the reasons mentioned below:

It converts Pixel Values from Integers to Float, in a range of 0.0-1.0 where 0.0 means 0 (0x00) and 1.0 means 255 (0xFF). Conv Nets work better on Float Values compared to Integer Values, and by normalizing it in a range of 0-1, computations will be reduced.
Normalization will help you to remove distortions caused by lights and shadows in an image.