python tensorflow keras valueerror resnet

How to add top layers to a pre-trained functional model

I'm trying to create a ResNet50 model using Keras to predict cats vs. dogs. I decided to just work with a 1000-point subset of the data, with a 700-150-150 train-validation-test split. (I know it's small, but it's what my computer can handle.) I've imported the model using

resnet_model = keras.applications.ResNet50(include_top=False, input_tensor=None, input_shape=None, pooling=None, classes=2)
resnet_model.compile(Adam(lr=.0001), loss='categorical_crossentropy', metrics=['accuracy'])

But when I try to fit it with

aug = ImageDataGenerator(rotation_range=20, zoom_range=0.15,
  width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
  horizontal_flip=True, fill_mode="nearest")

resnet_model.fit_generator(aug.flow(X_train, y_train, batch_size = batches), steps_per_epoch = len(X_train) // batches,
                          validation_data = (X_valid, y_valid), validation_steps = 4, epochs = 10, verbose = 1)

I get the following value error:

ValueError: Error when checking target: expected activation_352 to have 4 dimensions, but got array with shape (150, 2)

The (150,2) array is clearly coming from valid_y, but I don't know why that particular output should have 4 dimensions--that's supposed to be a label vector, not a 4-d image size and color vector. Can someone help me work out how to get the model to recognize this input?

Note: I know that Daniel Möller mentions here that I need to add a Flatten() layer, but the nature of the functional model and its call hardly seems to allow for that, unless I want to rewrite the entire ResNet from scratch (which seems to defeat the purpose of having a reusable pre-trained model). Any insight would be appreciated.

Solution

After reviewing Möller's comments and the code from Yu-Yang here, I was able to re-formulate the top of the model using the following code:

pre_resnet_model = keras.applications.ResNet50(include_top=False, weights='imagenet', input_tensor=None, input_shape=(224,224,3), pooling=None, classes=2)
for layer in pre_resnet_model.layers:
    layer.trainable = False
flatten = Flatten()(pre_resnet_model.output)   
output = Dense(2, activation='softmax')(flatten)
resnet_model = Model(pre_resnet_model.input, output)

The flatten layer flattens, and then the output layer draws on that. I'm not yet sure why the Model() only requires a ResNet50().input and an output, so if someone can explain to me why I skipped the Flatten() in there I would appreciate it--Model() clearly doesn't require a listing of all the layers, so is it just an input and an output? I'll take a look at the documentation, but in the meantime, if someone wanders by and has a clear explanation, I'll take it.