I'm new to machine learning and one of the things that I don't understand about Convolution neural networks, is that why we perform activation after convolution layer.
Because a convolution followed by a convolution is a convolution. Therefore, a convolutional neural network of arbitrary depth without intervening non-convolutional layers of some sort (such as a relu layer) is fundamentally equivalent to a convolutional neural network with only one layer. This is because composing linear transformations is linear:
y = m1*(m2*x + b2) + b1
= m1 * m2 * x + m1 * b2 + b1
Which is just a linear function... Why learn two when you can learn just one and it is exactly the same? This logic applies even to locally linear functions (convolutions are locally linear). Thus, for either convolutional NNs (but also vanilla NNs) we must do something, anything, non-linear in-between the linear layers. One incredibly simple non-linear function is the relu which is a basic "bend".