I wonder how the MinMaxScaler from sklearn works on a numpy array.
Does it scale based on the min max values per row, or does it scale based on the min max values of the entire data set?
# get pandas DataFrame.
dataframe = self.fetch_symbol(
symbol=symbol,
period=None,
lookup=False,)
# get X dataframe.
X = dataframe[self.columns].to_numpy()
# apply min max scaler.
scaler = sklearn.preprocessing.MinMaxScaler()
X = scaler.fit_transform(X)
Since this a dumped X data array and there is no 1.0 present? How could that be explained if the array is scaled per row?
[[[0.16046406 0.15805957 0.13419023 0.13800743 0.42891535 0.11922597]
[0.13934693 0.17908731 0.14614396 0.1923503 0.42822784 0.12399251]
[0.1925308 0.17908731 0.15501285 0.14426272 0.42806807 0.12856839]
[0.14469139 0.19340406 0.15070694 0.1633544 0.42789004 0.13296123]]
[[0.14742879 0.24297456 0.13553985 0.125562 0.48300352 0.30485521]
[0.1262465 0.16483446 0.12275064 0.16348472 0.4821922 0.28753448]
[0.16365769 0.19787805 0.1559126 0.19736756 0.48006021 0.26733746]
[0.19741902 0.22306021 0.20533419 0.21926109 0.47704036 0.24956408]]
[[0.19921137 0.21839448 0.18669666 0.18648596 0.41883789 0.11741573]
[0.18666493 0.18279369 0.17217224 0.18987489 0.41481457 0.11741573]
[0.18953269 0.2098939 0.19248072 0.1989151 0.41027914 0.12218477]
[0.1991136 0.2071456 0.18470437 0.21965205 0.40481333 0.12676305]]
...
[[0.34682917 0.33175915 0.36797013 0.35728155 0.40061129 0.34991894]
[0.34269779 0.32821724 0.36283865 0.35490831 0.40061115 0.34832607]
[0.33908283 0.32388823 0.35899004 0.35490831 0.40061589 0.34679691]
[0.33980583 0.32369146 0.36625964 0.35490831 0.40062501 0.34532891]]
[[0.9136542 0.87032664 0.93499907 0.93182309 0.73167466 0.84121732]
[0.89299731 0.85714286 0.92259798 0.92944984 0.73307946 0.88873786]
[0.88989878 0.84868162 0.91468695 0.90981661 0.73486931 0.88873786]
[0.87110101 0.82979142 0.91618363 0.90981661 0.73669497 0.88641553]]
[[0.62920884 0.59937033 0.64507025 0.63667745 0.59950843 0.63437614]
[0.61412931 0.60232192 0.64507025 0.65738943 0.59995533 0.63049207]
[0.63168767 0.6090122 0.66548928 0.66321467 0.60035125 0.62691872]
[0.63499277 0.6111767 0.66666524 0.65846818 0.60061939 0.62363125]]]
MinMaxScaler scales by column. Check the documentation, the scaling happens by taking min/max on the axis 0: (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
.
As a further proof, running on this array: np.array([[1,0],[3,5]])
outputs np.array([[0,0],[1,1]])
.