I am somewhat confused by the use of the term linear/non-linear when discussing neural networks. Can anyone clarify these 3 points for me:
A neural network is only non-linear if you squash the output signal from the nodes with a non-linear activation function. A complete neural network (with non-linear activation functions) is an arbitrary function approximator.
Bonus: It should be noted that if you are using linear activation functions in multiple consecutive layers, you could just as well have pruned them down to a single layer due to them being linear. (The weights would be changed to more extreme values). Creating a network with multiple layers using linear activation functions would not be able to model more complicated functions than a network with a single layer.
Interpreting the squashed output signal could very well be interpreted as the strength of this signal (biologically speaking). Thought it might be incorrect to interpret the output strength as an equivalent of confidence as in fuzzy logic.
Yes, you are spot on. The input signals along with their respective weights are a linear combination. The non-linearity comes from your selection of activation functions. Remember that a linear function is drawn as a line - sigmoid, tanh, ReLU and so on may not be drawn with a single straight line.
Most functions and classification tasks are probably best described by non-linear functions. If we decided to use linear activation functions we would end up with a much coarser approximation on a complex function.
You can sometimes read in papers that neural networks are universal approximators. This implies that a "perfect" network could be fitted to any model/function you could throw at it, though configuring the perfect network (#nodes and #layers ++) is a non-trivial task.
Read more about the implications at this Wikipedia page.