I have tried the following code but didn't find the difference between np.dot and np.multiply with np.sum
Here is np.dot code
logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
print(logprobs.shape)
print(logprobs)
cost = (-1/m) * logprobs
print(cost.shape)
print(type(cost))
print(cost)
Its output is
(1, 1)
[[-2.07917628]]
(1, 1)
<class 'numpy.ndarray'>
[[ 0.693058761039 ]]
Here is the code for np.multiply with np.sum
logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
print(logprobs.shape)
print(logprobs)
cost = - logprobs / m
print(cost.shape)
print(type(cost))
print(cost)
Its output is
()
-2.07917628312
()
<class 'numpy.float64'>
0.693058761039
I'm unable to understand the type and shape difference whereas the result value is same in both cases
Even in the case of squeezing former code cost value become same as later but type remains same
cost = np.squeeze(cost)
print(type(cost))
print(cost)
output is
<class 'numpy.ndarray'>
0.6930587610394646
What you're doing is calculating the binary cross-entropy loss which measures how bad the predictions (here: A2
) of the model are when compared to the true outputs (here: Y
).
Here is a reproducible example for your case, which should explain why you get a scalar in the second case using np.sum
In [88]: Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])
In [89]: A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])
In [90]: logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
# `np.dot` returns 2D array since its arguments are 2D arrays
In [91]: logprobs
Out[91]: array([[-0.78914626]])
In [92]: cost = (-1/m) * logprobs
In [93]: cost
Out[93]: array([[ 0.09864328]])
In [94]: logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
# np.sum returns scalar since it sums everything in the 2D array
In [95]: logprobs
Out[95]: -0.78914625761870361
Note that the np.dot
sums along only the inner dimensions which match here (1x8) and (8x1)
. So, the 8
s will be gone during the dot product or matrix multiplication yielding the result as (1x1)
which is just a scalar but returned as 2D array of shape (1,1)
.
Also, most importantly note that here np.dot
is exactly same as doing np.matmul
since the inputs are 2D arrays (i.e. matrices)
In [107]: logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)
In [108]: logprobs
Out[108]: array([[-0.78914626]])
In [109]: logprobs.shape
Out[109]: (1, 1)
np.dot
or np.matmul
returns whatever the resulting array shape would be, based on input arrays. Even with out=
argument it's not possible to return a scalar, if the inputs are 2D arrays. However, we can use np.asscalar()
on the result to convert it to a scalar if the result array is of shape (1,1)
(or more generally a scalar value wrapped in an nD array)
In [123]: np.asscalar(logprobs)
Out[123]: -0.7891462576187036
In [124]: type(np.asscalar(logprobs))
Out[124]: float
ndarray of size 1 to scalar value
In [127]: np.asscalar(np.array([[[23.2]]]))
Out[127]: 23.2
In [128]: np.asscalar(np.array([[[[23.2]]]]))
Out[128]: 23.2