I dont understand why we use the 'e' so often in NN may it be sigmoid function or softmax function.
In sigmoid function we are essentially compressing the values y=mx+b to be in the range 0-1 so why is it that we specifically use 'e'. If we go by intuition it makes sense to use '2' instead of 'e' i mean we are going for binary classification so that makes sense right ?
Also in softmax function we take the e^x / sum(e^x) why do we need to do that, i mean we are trying to get the probability of which class the x belongs to right so why can't we just you know do it like this x/sum(abs(x)) ?