python numpy multidimensional-array numpy-ndarray array-broadcasting

Considering numpy matrix row independent of other rows while performing softmax computation

This is rather noob question. I am trying to code the neural network from scratch. This is how I have wrote the softmax function:

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

Then I created a dummy weights matrix:

w0 = np.random.uniform(-0.8, 0.8, (6, 4))
print(w0.shape)
print(w0)

This prints:

(6, 4)
array([[-0.47349099, -0.56027454,  0.78373698, -0.23283302],
       [-0.63164942, -0.23417482,  0.00111565,  0.22848594],
       [-0.41288949,  0.05927629, -0.59752415,  0.45548192],
       [-0.35111661,  0.13681976, -0.73963359,  0.53842663],
       [-0.58055457, -0.03494196,  0.59678369, -0.40245336],
       [ 0.57615495, -0.03258459, -0.25033765,  0.20835347]])

These are weights of 4 classes (output labels) across 6 training examples. I want softmax probabilities to be calculated for each label for all training examples. So, I tried something like this:

sft = softmax(w0) # calculating softmax for all 6 training examples
print(sft.shape)
print(sft)
softmax(w0[0]) # calculating softmax for only 1st training example

This prints:

(6, 4)
[[0.02550981 0.02338932 0.08968387 0.03245067]  # <-- 1
 [0.02177809 0.03240715 0.041004   0.0514721 ]
 [0.02710354 0.04345954 0.0225341  0.06458847]
 [0.0288306  0.04696364 0.01954893 0.07017419]
 [0.02291976 0.03955183 0.0743912  0.02738788]
 [0.07287233 0.03964518 0.03188757 0.0504462 ]]
array([0.1491508 , 0.13675272, 0.52436386, 0.18973262]) # <-- 2

I felt that the first row in the sft matrix should be same as the last the output of softmax(w0[0]). That is line suffixed <-- 1 should be same as <-- 2 as both correspond to same training example. But it seems that softmax(w0) is calculating probabilities across whole matrix considering it as single training example and not interpreting each row as separate training example.

How to do softmax computation interpreting each row independent of others? What I am missing here?

Solution

e_x.sum() and np.max(x) will, by default be applied over the full matrix, returning a scalar. You can check this by looking at the difference between

>>> np.max(w0)
0.78373698
>>> np.max(w0, axis=1)
array([0.78373698, 0.22848594, 0.45548192, 0.53842663, 0.59678369,
       0.57615495])

for instance. You probably want to run this over every rows, which you can do using the axis argument:

def softmax(x):
    e_x = np.exp(x - np.max(x, axis=1))
    return e_x / e_x.sum(axis=1)

However, if you do this you will have broadcasting issues. Numpy doesn't know how to handle the operation

[[1,2],[3,4],[5,6]] - [7,8,9]

To keep the dimensions correct, you can use the keepdims argument:

>>> np.max(w0, axis=1, keepdims=True)
array([[0.78373698],
       [0.22848594],
       [0.45548192],
       [0.53842663],
       [0.59678369],
       [0.57615495]])

Now, applying all of that, you can should be able to use:

def softmax(x):
    e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return e_x / e_x.sum(axis=1, keepdims=True)

and, get this result:

>>> print(softmax(w0))
[[0.1491508  0.13675272 0.52436386 0.18973262]
 [0.14849238 0.22096587 0.27958287 0.35095887]
 [0.17188339 0.27560869 0.14290522 0.40960271]
 [0.17418476 0.28373847 0.11810801 0.42396877]
 [0.13954134 0.24080164 0.45291261 0.16674441]
 [0.37398946 0.2034638  0.16365082 0.25889592]]
>>> print(softmax([w0[0]]))
[[0.1491508  0.13675272 0.52436386 0.18973262]]

(Notice that now, softmax expects a 2D-array, so I had to encapsulate the single row in an array) Edit: to generalise the input that the softmax function can take, you can do some shape checks:

def softmax(x):
    x = np.array(x)
    if len(x.shape) == 1:
        x = x.reshape(1,-1)
    e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return e_x / e_x.sum(axis=1, keepdims=True)

for instance. The reshape will add a dimension, similarly to having [x] instead of x. The more assumptions you make on the input data, the less checks you have to make, the more efficient your function will be! For instance, I also added a check, in case the data is a list, to make it an array, but if it's not needed, you can remove it for performance improvement