I'm trying to consume this tutorial by Google to use TensorFlow Estimator to train and recognise images: https://www.tensorflow.org/tutorials/estimators/cnn
The data I can see in the tutorial are: train_data, train_labels, eval_data, eval_labels:
((train_data,train_labels),(eval_data,eval_labels)) =
tf.keras.datasets.mnist.load_data();
In the convolutional layers, there should be feature filter image data to multiply with the input image data? But I don't see them in the code.
As from this guide, the input image data matmul with filter image data to check for low-level features (curves, edges, etc.), so there should be filter image data too (the right matrix in the image below)?: https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks
The filters are the weight matrices of the Conv2d
layers used in the model, and are not pre-loaded images like the "butt curve" you gave in the example. If this were the case, we would need to provide the CNN with all possible types of shapes, curves, colours, and hope that any unseen data we feed the model contains this finite sets of images somewhere in them which the model can recognise.
Instead, we allow the CNN to learn the filters it requires to sucessfully classify from the data itself, and hope it can generalise to new data. Through multitudes of iterations and data( which they require a lot of), the model iteratively crafts the best set of filters for it to succesfully classify the images. The random initialisation at the start of training ensures that all filters per layer learn to identify a different feature in the input image.
The fact that earlier layers usually corresponds to colour and edges (like above) is not predefined, but the network has realised that looking for edges in the input is the only way to create context in the rest of the image, and thereby classify (humans do the same initially).
The network uses these primitive filters in earlier layers to generate more complex interpretations in deeper layers. This is the power of distributed learning: representing complex functions through multiple applications of much simpler functions.