Search code examples
deep-learninglstmrecurrent-neural-networksigmoid

Why LSTM uses sigmoid function to mimic the gate mechanism instead of binary value(0/1)?


In LSTM, we usually use sigmoid function to mimic the gates mechanism (soft), but the problem is in a lot of cases, such function gives a value around 0.5, which does not mean anything in terms of gates. Why don't use binary value (0/1) in LSTM, what is the basic idea and intuition using sigmoid function in LSTM and GRU?


Solution

  • A binary function in your network would cause problems with backpropagation, since its not a 'nicely differentiable' function (the delta function, which is its derivative, won't play nice in numeric computations)