Search code examples
tensorflowtensorflow-estimatoractivation-functionsigmoidrelu

Why do both tf.nn.relu and tf.nn.sigmoid work the same in this custom estimator


This is the guide to make a custom estimator in TensorFlow: https://www.tensorflow.org/guide/custom_estimators

The hidden layers are made using tf.nn.relu:

# Build the hidden layers, sized according to the 'hidden_units' param.
for units in params['hidden_units']:
    net = tf.layers.dense(net, units=units, activation=tf.nn.relu)

I altered the example a bit to learn XOR, with hidden_units=[4] and n_classes=2. When the activation function is changed to tf.nn.sigmoid, the example works as usual. Why is it so? Is it still giving correct result because XOR inputs are just zeros and ones?

Both functions give smooth loss curves converge to zero line.


Solution

  • About XOR problem, relu solved a vanishing gradient that an error value by back propagation is vanished in deep hidden layers.

    So, Sigmoid works if you make just one hidden layer.

    enter image description here

    Sigmoid has a vlue in 0~1. An Error value by back propagation from output layer is going to be very small value at the far from output layer by a partial differential equation.

    enter image description here

    Blue line is Relu and Yellow line is Sigmoid.

    Relu has x value if it is over than 0. So, Error value can be reached to 1st layer.