The conv-ReLU layers are responsible for finding the patterns in the image. These layers basically zoom in and find out the pattern. These layers identify structures like the textures, shapes, and edges. The activation functions actually introduce non-linearity meaning you can say it enhances the patterns found by the convolutional layer.
The affine-ReLU layers connect all the information from the previous layer and look at the bigger picture. These layers combine the feature and find any relevance between different parts of the picture.
Both these layers can learn complex patterns by working together. If we don't use Affine-ReLU layer model would not be able to capture high-level abstraction and it will be unable to find the relationship between different parts of the image.