My network's penultimate layer has shape (U, C)
where C
is the number of channels. I'd like to apply softmax function across each channel separately.
For example, if U=2
and C=3
, and the layer produces [ [1 2 3], [10 20 30] ]
, I'd like the output to do softmax(1, 2, 3)
for channel 0 and softmax(10, 20, 30)
for the channel 1.
Is there a way I can do this with Keras? I'm using TensorFlow as the backend.
Please also explain how to ensure that the loss is the sum of both cross entropies, and how I can verify that? (That is, I don't want the optimizer to only train for loss on one of the softmax, but rather the sum of each's cross entropy loss). The model uses Keras's built in categorical_crossentropy
for loss.
Define a Lambda
layer and use the softmax
function from the backend with a desired axis to compute the softmax over that axis:
from keras import backend as K
from keras.layers import Lambda
soft_out = Lambda(lambda x: K.softmax(x, axis=my_desired_axis))(input_tensor)
Update: A numpy array with N dimension would have a shape of (d1, d2, d3, ..., dn)
. Each one of them is called an axis. So the first axis (i.e. axis=0
) has dimension d1
, the second axis (i.e. axis=1
) has dimension d2
and so on. Further, the most common case of an array is a 2D array or a matrix which has a shape of (m, n)
, i.e. m
rows (i.e. axis=0
) and n
columns (i.e. axis=1
). Now when we specify an axis for performing an operation, it means that the operation should be computed over that axis. Let me make this more clear by examples:
>>> import numpy as np
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> a.shape
(3, 4) # three rows and four columns
>>> np.sum(a, axis=0) # compute the sum over the rows (i.e. for each column)
array([12, 15, 18, 21])
>>> np.sum(a, axis=1) # compute the sum over the columns (i.e. for each row)
array([ 6, 22, 38])
>>> np.sum(a, axis=-1) # axis=-1 is equivalent to the last axis (i.e. columns)
array([ 6, 22, 38])
Now, in your example, the same thing holds for computing softmax function. You must first determine over which axis you want to compute the softmax and then specify that using axis
argument. Further, note that softmax by default is applied on the last axis (i.e. axis=-1
) so if you want to compute it over the last axis you don't need the Lambda layer above. Just use the Activation
layer instead:
from keras.layers import Activation
soft_out = Activation('softmax')(input_tensor)
Update 2: There is also another way of doing this using Softmax
layer:
from keras.layers import Softmax
soft_out = Softmax(axis=desired_axis)(input_tensor)