python tensorflow keras depth estimation

expected dense_4 to have 2 dimensions, but got array with shape (1449, 480, 640, 1)

I'm trying to design a Convolutional Network to estimate the Depth of images using Keras.

I have RGB Input images with the shape of(1449,480,640,3) and have the Grayscale Output Depth Maps with the shape of (1449,480,640,1) but at the end when I want to design the final layers, I get stuck. using a Dense layer

i have this error "expected dense_4 to have 2 dimensions, but got array with shape (1449, 480, 640, 1)"

according to doc Keras the input data to the dense layer 2D array of shape (batch_size, units) and we have to change the dimension of output received from convolution layer to a 2D array.

after reshape my gt ndarray from 4d to 2d, it also doesn't work gt=gt.reshape(222566400,2) it shows me this error "expected dense_4 to have shape (4070,) but got array with shape (2,)"

i understand,that 4070 dense neurons to each of the 480*640 positions how can i rehape output array to fit dense layer dependent on with num. of neurons? Note that i have 2 dense layer after each other

enter image description here

my code:

import numpy as np
import h5py  # For .mat files
# data path
path_to_depth ='/content/drive/My Drive/DataSet/nyu_depth_v2_labeled.mat'

# read mat file
f = h5py.File(path_to_depth,'r')


pred = np.zeros((1449,480,640,3))
gt = np.zeros((1449,480,640,1))   

for i in range(len(f['images'])):
  # read 0-th image. original format is [3 x 640 x 480], uint8
  img = f['images'][i]

  # reshape
  img_ = np.empty([480, 640, 3])
  img_[:,:,0] = img[0,:,:].T
  img_[:,:,1] = img[1,:,:].T
  img_[:,:,2] = img[2,:,:].T


  # read corresponding depth (aligned to the image, in-painted) of size [640 x 480], float64
  depth = f['depths'][i]

  depth_ = np.empty([480, 640])
  depth_[:,:] = depth[:,:].T


  pred[i,:,:,:] = img_ 
  #print(pred.shape)#(1449,480,640,3)

  gt[i,:,:,0] = depth_ 
  #print(gt.shape)#(1449, 480, 640, 1)

# dimensions of our images.
img_width, img_height = 480, 640


gt=gt.reshape(222566400,2)
gt = gt.astype('float32')

from keras.preprocessing.image import ImageDataGenerator #import library to preprocess the dataset
from keras.models import Sequential #import keras models libraries
from keras.layers import Conv2D, MaxPooling2D ,BatchNormalization#import layers libraries
from keras.layers import Activation, Dropout, Flatten, Dense #import layers libraries
from sklearn.metrics import classification_report, confusion_matrix #import validation functions
import tensorflow as tf

#Training
model = Sequential() #model type initialization

#conv1
model.add(Conv2D(96, (11, 11),padding='VALID', strides=4,input_shape=(img_width, img_height, 3))) #input layer
model.add(Activation('relu'))

model.add(BatchNormalization(axis=1))

#pool1 
model.add(MaxPooling2D(pool_size=(3, 3),padding='VALID')) #Pooling Layer: reduces the matrices

#conv2
model.add(Conv2D(256, (5, 5),padding='SAME')) #input layer
model.add(Activation('relu'))
model.add(BatchNormalization(axis=1)) 

#conv3
model.add(Conv2D(384, (3, 3),padding='SAME')) #input layer
model.add(Activation('relu'))

#conv4
model.add(Conv2D(384, (3, 3),padding='SAME',strides=2)) #input layer
model.add(Activation('relu'))

#conv5
model.add(Conv2D(256, (3, 3),padding='SAME')) #input layer
model.add(Activation('relu'))

#pool2
model.add(MaxPooling2D(pool_size=(3, 3),padding='VALID')) #Pooling Layer: reduces the matrices

model.add(Flatten()) #this layer converts the 3D Layers to 1D Layer
model.add(Dense(4096,activation='sigmoid')) #densly connected NN Layers

model.add(Dropout(0.5)) #layer to prevent from overfitting


model.add(Dense (4070,activation='softmax')) #densly connected NN Layers

#Model configuration for training
model.compile(loss='binary_crossentropy', #A loss function calculates the error in prediction
              optimizer='rmsprop',        #The optimizer updates the weight parameters to minimize the loss function
              metrics=['accuracy'])       #A metric function is similar to a loss function, except that the results from evaluating a metric are not used when training the model.

model.fit(pred,gt,batch_size=9,epochs=161,verbose=1, validation_split=0.1)

Solution

I guess your architecture has some problems. If I understood well, what you want in the output should be of size (1449,480,640,1).

First of all, your last layer activation is a softmax, and your loss is set to be the 'binary_crossentropy' which really does not make sense. and additionally, you have another DENSE layer before that with sigmoid activation. Is there a reason for that? Why you have tow DENSE connected together?

Coming back to your problem, this architecture you have does not really solve your issue. What you need is a Autoenocoder-ish structure. To do so, I would suggest after you flatten the results of your convolutions, add some more layers to UPSAMPLE following by Conv layers, and manage it in a way to get to the output size of (1449,480,640,1). Since you want it grayscale ( i imagine you mean each pixel should be 0 or 1), i suggest to use sigmoid for the last layer activation and then use the binary cross-entropy for the loss