Search code examples
pythontensorflowkerassoftmax

Producing a softmax on two channels in Tensorflow and Keras


My network's penultimate layer has shape (U, C) where C is the number of channels. I'd like to apply softmax function across each channel separately.

For example, if U=2 and C=3, and the layer produces [ [1 2 3], [10 20 30] ], I'd like the output to do softmax(1, 2, 3) for channel 0 and softmax(10, 20, 30) for the channel 1.

Is there a way I can do this with Keras? I'm using TensorFlow as the backend.

UPDATE

Please also explain how to ensure that the loss is the sum of both cross entropies, and how I can verify that? (That is, I don't want the optimizer to only train for loss on one of the softmax, but rather the sum of each's cross entropy loss). The model uses Keras's built in categorical_crossentropy for loss.


Solution

  • Define a Lambda layer and use the softmax function from the backend with a desired axis to compute the softmax over that axis:

    from keras import backend as K
    from keras.layers import Lambda
    
    soft_out = Lambda(lambda x: K.softmax(x, axis=my_desired_axis))(input_tensor)
    

    Update: A numpy array with N dimension would have a shape of (d1, d2, d3, ..., dn). Each one of them is called an axis. So the first axis (i.e. axis=0) has dimension d1, the second axis (i.e. axis=1) has dimension d2 and so on. Further, the most common case of an array is a 2D array or a matrix which has a shape of (m, n), i.e. m rows (i.e. axis=0) and n columns (i.e. axis=1). Now when we specify an axis for performing an operation, it means that the operation should be computed over that axis. Let me make this more clear by examples:

    >>> import numpy as np
    >>> a = np.arange(12).reshape(3,4)
    >>> a
    array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11]])
    
    >>> a.shape
    (3, 4)   # three rows and four columns
    
    >>> np.sum(a, axis=0)  # compute the sum over the rows (i.e. for each column)
    array([12, 15, 18, 21])
    
    >>> np.sum(a, axis=1)  # compute the sum over the columns (i.e. for each row)
    array([ 6, 22, 38])
    
    >>> np.sum(a, axis=-1) # axis=-1 is equivalent to the last axis (i.e. columns)
    array([ 6, 22, 38])
    

    Now, in your example, the same thing holds for computing softmax function. You must first determine over which axis you want to compute the softmax and then specify that using axis argument. Further, note that softmax by default is applied on the last axis (i.e. axis=-1) so if you want to compute it over the last axis you don't need the Lambda layer above. Just use the Activation layer instead:

    from keras.layers import Activation
    
    soft_out = Activation('softmax')(input_tensor)
    

    Update 2: There is also another way of doing this using Softmax layer:

    from keras.layers import Softmax
    
    soft_out = Softmax(axis=desired_axis)(input_tensor)