I would like to know how does the algorithm update the biais in this situation?
or
both give me different results. Or is the way i put the bias above wrong? I think it should differents bias per perceptron.
There is one bias per neuron, not one global bias. In typical implementations you see one bias variable because it is a vector, where i'th dimension is added to i'th neuron.
In the non standard network you drew the update rule is actually ... neither! It should be a sum of your equations. Note, that if you have bias that is a vector, then using a sum will actually work too, because your partial derivatives that you computed will only affect corresponding dimensions!