How to understand the dimensions of the result array after using model.predict()

I'm recurrencing a code to retrieve the item, but when I debug in model.predict function, I find that the input of this function is with the dimension(1, 224, 224, 3), but the output is (1, 7, 7, 2048). Shouldn't the result of model.predict() be a 1D array which give the probability that the object belongs to each category instead of 4D? How to understand the dimension of this result array?

    model_features = model.predict(x, batch_size=1)

The concrete code is following: (This is only part of the whole code and may not run directly)

import keras.applications.resnet50
import numpy as np
import os
import pickle
import time
import vse
from keras.preprocessing import image
from keras.models import Model, load_model

model = keras.applications.resnet50.ResNet50(include_top=False)
model_extension == "resnet"

def extract_features_cnn(img_path):
    """Returns a normalized features vector for image path and model specified in parameters file """
    print('Using model', model_extension)
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    if model_extension == "vgg19":
        x = keras.applications.vgg19.preprocess_input(x)
    elif model_extension == "vgg16":
        x = keras.applications.vgg16.preprocess_input(x)
    elif model_extension == "resnet":
        x = keras.applications.resnet50.preprocess_input(x)
    else:
        print('Wrong model name')
    model_features = model.predict(x, batch_size=1)
    x = model_features[0]
    total_sum = sum(model_features[0])
    features_norm = np.array([val / total_sum for val in model_features[0]], dtype=np.float32)
    if model_extension == "resnet":
        print("reshaping resnet")
        features_norm = features_norm.reshape(2048, -1)
    return features_norm

Solution

Your question is not clear enough but I will try to explain as much as I can understand your question. Your model only has ResNet which has the only convolutional layers and it does not have a linear layer which can cause a result that represents the probability of classes. Your result is not 4D as you think. In your output shape which is (1, 7, 7, 2048), 1 represents batch size. It means you gave only 1 image to the network and get 1 result. 7s represents your output size which is 7x7. And 2048 represents your output channels. If you want to have the probability of classes you need to add a linear layer at the end of the ResNet network. You can add it with the argument include_top=True and you can specify class number with argument classes=1000.

Here is the documentation.