I am trying to figure out if I am creating an artificial neural network using the sigmoid activation function and using bias correctly. I want one bias node to input to all hidden nodes with static output -1 combined with its weight, and then one going to the output also static output -1 combined with its weight. I can then train these bias exactly like I would train the other neurons, correct?!
It is a correct reasoning, however it is rather uncommon to set "-1" value (why not +1?), I have never seen this before in literature. If you maintain the correct graph structure than there is no difference between updating weights for "real" nodes and "bias nodes". The only difference could arise if you don't store graph structure and so you do not "know" that bias (the one connected to the output node) has no "children" and so signal is not "back propagated" deeper into the net. I have seen such codes, which simply store layers as arrays and they place biases at index 0 so they can iterate from 1 during back propagation. Obviously, graph based implementation is much more readable (however much slower as you cannot vectorize your computations).