I'm first time building a CNN model for image classification and i'm a little bit confused about what would be the input shape for each type (1D CNN, 2D CNN, 3D CNN) and how to fix the number of filters in the convolution layer. My data is 100x100x30 where 30 are features. Here is my essay for the 1D CNN using the Functional API Keras:
def create_CNN1D_model(pool_type='max',conv_activation='relu'):
input_layer = (30,1)
conv_layer1 = Conv1D(filters=16, kernel_size=3, activation=conv_activation)(input_layer)
max_pooling_layer1 = MaxPooling1D(pool_size=2)(conv_layer1)
conv_layer2 = Conv1D(filters=32, kernel_size=3, activation=conv_activation)(max_pooling_layer1)
max_pooling_layer2 = MaxPooling1D(pool_size=2)(conv_layer2)
flatten_layer = Flatten()(max_pooling_layer2)
dense_layer = Dense(units=64, activation='relu')(flatten_layer)
output_layer = Dense(units=10, activation='softmax')(dense_layer)
CNN_model = Model(inputs=input_layer, outputs=output_layer)
return CNN_model
CNN1D = create_CNN1D_model()
CNN1D.compile(loss = 'categorical_crossentropy', optimizer = "adam",metrics = ['accuracy'])
Trace = CNN1D.fit(X, y, epochs=50, batch_size=100)
However, while trying the 2D CNN model by just changing Conv1D, Maxpooling1D to Conv2D and Maxpooling2D, i got the following error :
ValueError: Input 0 of layer conv2d_1 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (None, 30, 1)
Can anyone please tell me how would be the input shape for 2D CNN and 3D CNN ? And what can be done on input data preprocessing?
TLDR; your X_train
can be looked at as (batch, spatial dims..., channels). A kernel applies to the spatial dimensions for all channels in parallel. So a 2D CNN, would require two spatial dimensions (batch, dim 1, dim 2, channels)
.
So for (100,100,3)
shaped images, you will need a 2D CNN that convolves over 100 height and 100 width, over all the 3 channels.
Lets, understand the above statement.
First, you need to understand what CNN (in general) is doing.
A kernel is convolving through the spatial dimensions of a tensor across its feature maps/channels while performing a simple matrix operation (like dot product) to the corresponding values.
Now, Let's say you have 100 images (called batches). Each image is 28 by 28 pixels and has 3 channels R, G, B (which are also called feature maps in context to CNNs). If I were to store this data as a tensor, the shape would be (100,28,28,3)
.
However, I could just have an image that doesn't have any height (may like a signal) OR, I could have data that has an extra spatial dimension such as a video (height, width, and time).
In general, here is how the input for a CNN-based neural network looks like.
The second key point you need to know is, A 2D kernel will convolve over 2 spatial dimensions BUT the same kernel will do this over all the feature maps/channels. So, if I have a (3,3)
kernel. This same kernel will get applied over R, G, B channels (in parallel) and move over the Height
and Width
of the image.
Finally, the operation (for a single feature map/channel and single convolution window) can be visualized like below.
Therefore, in short -
Let's take the example of tensors with single feature maps/channels (so, for an image, it would be greyscaled) -
So, with that intuition, we see that if I want to use a 1D CNN, your data must have 1 spatial dimension, which means each sample needs to be 2D (spatial dimension and channels), which means the
X_train
must be a 3D tensor(batch, spatial dimensions, channels)
.Similarly, for a 2D CNN, you would have 2 spatial dimensions (H, W for example) and would be 3D samples
(H, W, Channels)
andX_train
would be(Samples, H, W, Channels)
Let's try this with code -
import tensorflow as tf
from tensorflow.keras import layers
X_2D = tf.random.normal((100,7,3)) #Samples, width/time, channels (feature maps)
X_3D = tf.random.normal((100,5,7,3)) #Samples, height, width, channels (feature maps)
X_4D = tf.random.normal((100,6,6,2,3)) #Samples, height, width, time, channels (feature maps)
For applying 1D CNN -
#With padding = same, the edge pixels are padded to not skip a few
#Out featuremaps = 10, kernel (3,)
cnn1d = layers.Conv1D(10, 3, padding='same')(X_2D)
print(X_2D.shape,'->',cnn1d.shape)
#(100, 7, 3) -> (100, 7, 10)
For applying 2D CNN -
#Out featuremaps = 10, kernel (3,3)
cnn2d = layers.Conv2D(10, (3,3), padding='same')(X_3D)
print(X_3D.shape,'->',cnn2d.shape)
#(100, 5, 7, 3) -> (100, 5, 7, 10)
For 3D CNN -
#Out featuremaps = 10, kernel (3,3)
cnn3d = layers.Conv3D(10, (3,3,2), padding='same')(X_4D)
print(X_4D.shape,'->',cnn3d.shape)
#(100, 6, 6, 2, 3) -> (100, 6, 6, 2, 10)