In the U-net, have activation functions in all the layers but there seems to be no activation function in the upsampling layer (that is done using transpose convolution). Why does this offer more efficiency than having an activation function?
From my understanding, activation functions offer non-linearity. So, this question really is, what benefit is there to maintaining linearity in transpose convolutions but maintaining non-linearity on regular convolutions. Wouldn't it just always be best to have an activation function in these layers?
My only other intuition is that perhaps, they're trying to keep the upsampling as closely related to regular morphological interpolation methods.
I think your interpretation is right: they were just trying to keep the process similar to the upsampling operated with classic interpolations because of a better interpretability of the architecture (while still allowing flexibility to the network, that can still learn the best weights for the upsampling). In general, if you want to add more non-linearity, you can enter any desired activation function (such as a ReLU) after that level, but personally, from my experience, I would say that performance will not change much.