Search code examples
numpyarray-broadcasting

Unexpected behavior when trying to normalize a column in numpy.array (version 1.17.4)


So, I was trying to normalize (i.e. max = 1, min = value/max) a specific column within a numpy array. I hoped this piece of code would do the trick:

bar = np.arange(12).reshape(6,2)

bar
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

bar[:,1] = bar[:,1] / bar[:,1].max()
bar
array([[ 0,  0],
       [ 2,  0],
       [ 4,  0],
       [ 6,  0],
       [ 8,  0],
       [10,  1]])

works as expected if the type of each value is 'float'.

foo = np.array([[1.1,2.2],
               [3.3,4.4],
               [5.5,6.6]])
foo[:,1] = foo[:,1] / foo[:,1].max()

foo
array([[1.1       , 0.33333333],
       [3.3       , 0.66666667],
       [5.5       , 1.        ]])

I guess what I'm asking is where is this default 'int' I'm missing here? (I'm taking this as a 'learning opportunity')


Solution

  • If you simply execute:

    out = bar[:,1] / bar[:,1].max()
    print(out)
    >>> [0.09090909 0.27272727 0.45454545 0.63636364 0.81818182 1.        ]
    

    It's working just fine, since out is a newly created float array made to store these float values. But np.arange(12) gives you an int array by default. bar[:,1] = bar[:,1] / bar[:,1].max() tries to store the float values inside the integer array, and all the values become integers and you get [0 0 0 0 0 1].

    To set the array as a float by default:

    bar = np.arange(12, dtype='float').reshape(6,2)
    

    Alternatively, you can also use:

    bar = np.arange(12).reshape(6,2).astype('float')
    

    It isn't uncommon for us to need to change the data type of the array throughout the program, as you may not always need the dtype you define originally. So .astype() is actually pretty handy in all kinds of scenarios.