python tensorflow transfer-learning conv-neural-network

Any suggestions to improve my CNN model (always the same low test accuracy)?

I am working on a project to detect the presence of a person in a painting. I have 4000 training images and 1000 test images resized to (256,256,3)

I tried a CNN model with 3 (Conv layers, MaxPool, BatchNormalization) and 2 fully connected layers.

model = Sequential()
model.add(Conv2D(32, kernel_size = (7, 7), activation='relu', input_shape=shape))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(7,7), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(96, kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())

model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation = 'sigmoid'))

The train accuracy always converges to 1 (with just 20-50 epochs) and the test accuracy always remains constant around 0.67.

I tried the following:

I tried changing the size of the layers and adding more layers.
I tried data augmentation
I tried smaller images 128x128x3.

But I always have the same results.

I don't know if this is due to the few images I have, or if the architecture isn't big enough to learn from complex paintings.

I thought of trying Transfer Learning (But I don't know if this will help because it is my first time trying it). Also, do you have any idea where can I find trained models?

So, I am asking from some suggestions to improve my model.

Solution

I tried using VGG16 (frozen) with 4 fully connected layers and the validation accuracy went up to 0.83. Also, I am using ImageDataGenerator.