For a two-dimensional array, I'm trying to make a standardize-function, which should work row-wise and column-wise. I'm not sure what to do when an argument is given with axis=1 (row-wise).
def standardize(x, axis=None):
if axis == 0:
return (x - x.mean(axis)) / x.std(axis)
else:
?????
I tried to change axis
to axis = 1
in this part: (x - x.mean(axis)) / x.std(axis)
But then I got the following error:
ValueError: operands could not be broadcast together with shapes (4,3) (4,)
Can someone explain to me what to do as I'm still a beginner?
The reason for the error you are seeing is that you cannot calculate
x - x.mean(1)
because
x.shape = (4, 3)
x.mean(1).shape = (4,) # mean(), sum(), std() etc. remove the dimension they are applied to
However, you could do the operation if we could somehow make sure mean()
keeps the dimension it is applied to, resulting in
x.mean(1).shape = (4, 1)
(look up NumPy Broadcasting rules).
Because this is such a common issue, the NumPy developers introduced a parameter that does exactly that: keepdims=True
, which you should use in mean()
and std()
:
def standardize(x, axis=None):
return (x - x.mean(axis, keepdims=True)) / x.std(axis, keepdims=True)