Evolutionary algorithm: What is the purpose of hidden/intermediate nodes

I saw this video online, it shows a "neural network" with three inputs and three outputs, although the inputs are not changing, I believe there is enough similarity between this network and those of other evolutionary algorithms to make the question valid.

My question is, since it is possible for all three input nodes shown in the video to "exert influence" on the output nodes with controlled weight, why is the four intermediate nodes necessary? Why not connect the input nodes directly to the outputs?

Solution

An artificial neural network consisting only of inputs and outputs is a (single-layer) perceptron. Realizing these networks would not solve many problems set back the use of artificial neural networks for over a decade!

For simplicity, imagine only one output neuron (many outputs can be considered many similar problems in parallel). Furthermore, let's consider for the moment only one input. The neurons use an activation function, which determines the activity (output) of this neuron depending on the input it receives. For activation functions used in practice*, the more input, the higher output (or the same in some ranges, but let's forget about that). And chaining two of these also results in "the more input, the more final output".

With one output neuron you interpret the results as "if output is over threshold, then A, otherwise B". (Where "A" and "B" can mean different things). Because both our neurons produce more signal the more input they receive, then our network can only answer easy linear problems of type "if input signal is over threshold, then A, otherwise B".

Using two inputs is very similar: we combine the output of two input-neurons. Now we are in the situation "if inputs to input neurons 1 and 2 are, together, high enough that our final output is over a threshold, then A, otherwise B". Graphically this means we can decide A or B by drawing a line (allow curvature) on the input 1-input 2 plane:

But there are problems that cannot be solved this way! Consider the XOR problem. Our goal is to produce this:

As you can see, it is impossible to draw a line that gets all the A's on one side and all the B's on the other. And these lines represent all the possible one-layer perceptrons! We say that the XOR problem is not linearly separable (and this is why the XOR is a traditional test for neural networks).

Introducing at least one hidden layer allows to solve this problem. In practice this is like combining the result of two one-layer perceptrons:

Adding more neurons to the hidden layer means being able to solve more and more complex problems. In fact, any function f(A,B).

However, you may know other networks use more layers (see deep learning), but in this case the motivation is not a theoretical limitation, but rather searching for networks that perform better.

*Using weird hand-crafted activation functions will not make things better. You may be able to solve an specific problem, but still not all, and you need to know how to design this activation function.