I was reading about the skip layer connections in the ResNet paper, and how they were beneficial in training very deep networks.
Does it make sense to use such connections in smaller (i.e. AlexNet-like) networks of less than 10 layers?
Not really, because their main purpose is on improving gradient flow through the network, which in essence increases the capacity without increasing the number of parameters. However, if a small network has enough capacity for your use case, you do not really need more capacity.