I have an array with some values that are zero and some that are non-zero. Then I apply a softmax, I want all non-zero values add up to 1. But after the softmax, all values are non-zero and add up to 1.
Here's what I'm trying to do: I have some values
score[0]
<tf.Tensor: shape=(1, 48), dtype=float32, numpy=
array([[ 2.405819 , 27.748499 , 16.080362 , 8.780167 , 16.615538 ,
19.353844 , 19.497992 , 16.051327 , 5.4946175 , 15.927819 ,
11.512515 , 19.716702 , 15.100697 , 26.370419 , 21.838608 ,
10.650975 , 9.212484 , 17.439907 , 14.322778 , 12.001259 ,
10.433163 , 10.011807 , 15.847178 , 18.343014 , 26.086296 ,
26.723047 , 17.28703 , -0.7059817 , 26.380203 , 21.49808 ,
14.828656 , 13.711437 , 19.565845 , 5.9418716 , 12.614753 ,
29.56828 , 1.1372657 , 25.873251 , 36.031494 , -7.397362 ,
12.691793 , 4.3349338 , 15.1586275 , 14.650254 , 14.632486 ,
18.829857 , 21.885925 , 0.56010276]], dtype=float32)>
and a mask
mask_test[0]
<tf.Tensor: shape=(1, 48), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1]])>
I multiply the values with the mask
score = tf.multiply(score, tf.cast(mask_test, tf.float32))
score[0]
<tf.Tensor: shape=(1, 48), dtype=float32, numpy=
array([[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , -0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , -0. ,
0. , 0. , 0. , 0. , 0. ,
18.829857 , 21.885925 , 0.56010276]], dtype=float32)>
That works fine. Now I want to add a softmax, so that all non-zero values add up to 1. The 0 should stay 0.
attention_weights = tf.nn.softmax(score, axis=-1)
attention_weights[0]
<tf.Tensor: shape=(1, 48), dtype=float32, numpy=
array([[2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 2.9859784e-10, 2.9859784e-10, 2.9859784e-10,
2.9859784e-10, 4.4956207e-02, 9.5504379e-01, 5.2280064e-10]],
dtype=float32)>
And the result are all non-zero values. I guess that is from the exponential in the softmax. Is there a way to achieve this with the softmax or is there another way? The mask is not always the same.
thanks in advance
Softmax does not work that way. Take a look at the formula of softmax
You would need to define a custom function for this.
A simple way of doing this would be:
def custom_soft_max(arr):
non_zero_indices = np.where(arr != 0)
arr[non_zero_indices] = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
return arr
This will exclude all the indices that have a corresponding value of 0, and then perform softmax on only the non-zero indices.