Regularly, a simple neural network to solve XOR should have 2 inputs, 2 neurons in hidden layer, 1 neuron in output layer.
However, the following example implementation has 2 output neurons, and I don't get it:
Why did the author put 2 output neurons in there?
Edit: Author of the example noted that he is using 4 neurons in hidden layer, 2 neurons in output layer. But I still don't get it why, why a shape of {4,2} instead of {2,1}?
This is called one hot encoding. The idea is that you have one neuron per class. Each neuron gives the probability of that class.
I don't know why he uses 4 hidden neurons. 2 should be enough (if I remember correctly).