My dataset consists of 1000 RGB images of 100x40 size. Therefore, Xdata = 1x1x1000
of data type double.
Out of these I used first 700 for training, Xtrain = 1x1x700
of data type Image.
I am getting this error
Error using trainNetwork (line 150)
Invalid training data. X must be a 4-D array of images, an ImageDatastore, or a table.
I cannot understand how to use the table data structure and What is the proper way to input the data into CNN? Is not possible to input RGB image directly as image data type or do I need to convert each channel and feed 3 matrices of 2 D?
imageSize = [100 40];
dropoutProb = 0.1;
numF = 8;
layers = [
imageInputLayer(imageSize)
convolution2dLayer(3,numF,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(3,'Stride',2,'Padding','same')
convolution2dLayer(3,2*numF,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(3,'Stride',2,'Padding','same')
convolution2dLayer(3,4*numF,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(3,'Stride',2,'Padding','same')
convolution2dLayer(3,4*numF,'Padding','same')
batchNormalizationLayer
reluLayer
convolution2dLayer(3,4*numF,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer([1 13])
dropoutLayer(dropoutProb)
fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];
miniBatchSize = 50;
validationFrequency = floor(numel(Ytrain)/miniBatchSize);
options = trainingOptions('adam', ...
'InitialLearnRate',3e-4, ...
'MaxEpochs',25, ...
'MiniBatchSize',miniBatchSize, ...
'Shuffle','every-epoch', ...
'Plots','training-progress', ...
'Verbose',false, ...
'ValidationData',{XValidation,YValidation}, ...
'ValidationFrequency',validationFrequency, ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropFactor',0.1, ...
'LearnRateDropPeriod',20);
trainedNet = trainNetwork(Xtrain,layers,options);
The input dimensions are wrong. The 4D array should be of shape:
[height, width, number_of_channels, number of images]
So in your case you'd need the train image dimensions to be:
[100, 40, 3, 700]
And test image dimensions to be:
[100, 40, 3, 300]
You also have a dropout layer before the final only fully connected layer, should there be an additional fully connected layer before it? Now you are throwing away your max pooling results, which can be done but is quite aggressive.
trainNetwork() can also take other inputs if you don't specifically want to use the 4-D datastore. I prefer the augmented image datastore made from an image data store, it's a very easy way to augment your images which you definitely should be doing if you haven't. If not, consider changing your image datatypes from double to uint8, 3 uint8 channels are enough to represent a typical input image completely and it should speed up your training.