Search code examples
matlabimage-processingmachine-learningdeep-learningconv-neural-network

How to make training data as a 4 D array in neural network Matlab - proper way to input data


My dataset consists of 1000 RGB images of 100x40 size. Therefore, Xdata = 1x1x1000 of data type double.

Out of these I used first 700 for training, Xtrain = 1x1x700 of data type Image.

I am getting this error

Error using trainNetwork (line 150)
    Invalid training data. X must be a 4-D array of images, an ImageDatastore, or a table.

I cannot understand how to use the table data structure and What is the proper way to input the data into CNN? Is not possible to input RGB image directly as image data type or do I need to convert each channel and feed 3 matrices of 2 D?

imageSize = [100 40];


dropoutProb = 0.1;
numF = 8;
layers = [
    imageInputLayer(imageSize)

    convolution2dLayer(3,numF,'Padding','same')
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(3,'Stride',2,'Padding','same')

    convolution2dLayer(3,2*numF,'Padding','same')
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(3,'Stride',2,'Padding','same')

    convolution2dLayer(3,4*numF,'Padding','same')
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer(3,'Stride',2,'Padding','same')

    convolution2dLayer(3,4*numF,'Padding','same')
    batchNormalizationLayer
    reluLayer
    convolution2dLayer(3,4*numF,'Padding','same')
    batchNormalizationLayer
    reluLayer

    maxPooling2dLayer([1 13])

    dropoutLayer(dropoutProb)
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];

miniBatchSize = 50;
validationFrequency = floor(numel(Ytrain)/miniBatchSize);
options = trainingOptions('adam', ...
    'InitialLearnRate',3e-4, ...
    'MaxEpochs',25, ...
    'MiniBatchSize',miniBatchSize, ...
    'Shuffle','every-epoch', ...
    'Plots','training-progress', ...
    'Verbose',false, ...
    'ValidationData',{XValidation,YValidation}, ...
    'ValidationFrequency',validationFrequency, ...
    'LearnRateSchedule','piecewise', ...
    'LearnRateDropFactor',0.1, ...
    'LearnRateDropPeriod',20);

  trainedNet = trainNetwork(Xtrain,layers,options);

Solution

  • The input dimensions are wrong. The 4D array should be of shape:

    [height, width, number_of_channels, number of images]
    

    So in your case you'd need the train image dimensions to be:

    [100, 40, 3, 700]
    

    And test image dimensions to be:

    [100, 40, 3, 300]
    

    You also have a dropout layer before the final only fully connected layer, should there be an additional fully connected layer before it? Now you are throwing away your max pooling results, which can be done but is quite aggressive.

    trainNetwork() can also take other inputs if you don't specifically want to use the 4-D datastore. I prefer the augmented image datastore made from an image data store, it's a very easy way to augment your images which you definitely should be doing if you haven't. If not, consider changing your image datatypes from double to uint8, 3 uint8 channels are enough to represent a typical input image completely and it should speed up your training.