Search code examples
pythonscikit-learnhistogramnormalize

plt.hist showing the strange plot after preprocessing.normalize


Im new to Python, so I execute this code:

test1 = np.array([95, 91, 104, 93, 85, 107, 97, 90, 86, 93, 86, 90, 88, 89, 94, 96, 89, 99, 104, 101, 84, 84, 94, 87, 99, 85, 83, 107, 102, 80, 89, 88, 93, 101, 87, 100, 82, 90, 106, 81, 95])
plt.hist(test1)
plt.show()

And get this image:enter image description here

After I normalize data and check the plot again:

plt.gcf().clear()
test2 = preprocessing.normalize([test1])
    plt.hist(test2)
    plt.show()

enter image description here

The new plot has different shape and on the histagram I see that every number represents once, which looks strange for me comparing to first plot. So I expect smth similat to first plot, but with range from 0 to 1. Where am I mistaking?


Solution

  • Here is one solution. You need MinMaxScaler whose default range for normalizing is (0,1). For more info, refer to this official page from sklearn.

    from sklearn import preprocessing
    
    test1 = np.array([95, 91, 104, 93, 85, 107, 97, 90, 86, 93, 86, 90, 88, 89, 94, 96, 89, 99, 104, 101, 84, 84, 94, 87, 99, 85, 83, 107, 102, 80, 89, 88, 93, 101, 87, 100, 82, 90, 106, 81, 95])
    min_max_scaler = preprocessing.MinMaxScaler()
    test2 = min_max_scaler.fit_transform(test1.reshape(-1, 1));
    plt.hist(test2)
    

    Output

    enter image description here