Search code examples
machine-learningkerasdeep-learningsoftmaxactivation-function

What is the difference between keras.activations.softmax and keras.layers.Softmax?


What is the difference between keras.activations.softmax and keras.layers.Softmax? Why are there two definitions of the same activation function?

keras.activations.softmax: https://keras.io/activations/

keras.layers.Softmax: https://keras.io/layers/advanced-activations/


Solution

  • They are equivalent to each other in terms of what they do. Actually, the Softmax layer would call the activations.softmax under the hood:

    def call(self, inputs):
        return activations.softmax(inputs, axis=self.axis)
    

    However, their difference is that the Softmax layer could be directly used as a layer:

    from keras.layers import Softmax
    
    soft_out = Softmax()(input_tensor)
    

    But, activations.softmax could not be used directly as a layer. Rather, you can pass it as the activation function of other layers through activation argument:

    from keras import activations
    
    dense_out = Dense(n_units, activation=activations.softmax)
    

    Further, note that the good thing about using Softmax layer is that it takes an axis argument and you can compute the softmax over another axis of the input instead of its last axis (which is the default):

    soft_out = Softmax(axis=desired_axis)(input_tensor)