I have list of dataframes (all the dataframes has identical numeric columns ,represent different results of the same test. I want to keep them separated ). I want to training scikit MinMaxScalar that will take into account the minimum and maximum values for each column from all dataframes. May someone have solution to that?
Thanks,
MAK
You want to do the following:
tmp
as a concatenation of all your DFs from the listMinMaxScaler
object on tmp
DFMinMaxScaler
object
UPDATE:
May you have a suggestion for training without creating temp dataframe?
we can make use of the .partial_fit()
method in order to fit data from all DFs iteratively:
creating a list of sample DFs:
In [100]: dfs = [pd.DataFrame(np.random.rand(3,3)*100 - 50) for _ in range(3)]
In [101]: dfs[0]
Out[101]:
0 1 2
0 45.473162 42.366712 41.395652
1 -35.476703 43.777850 -36.363200
2 0.479528 14.861075 4.196630
In [102]: dfs[2]
Out[102]:
0 1 2
0 6.888876 -24.454986 -39.794309
1 -8.988094 -34.426252 -24.760782
2 34.317689 -43.644643 44.243769
scaling:
In [103]: from sklearn.preprocessing import MinMaxScaler
In [104]: mms = MinMaxScaler()
In [105]: _ = [mms.partial_fit(df) for df in dfs]
In [106]: scaled = [mms.transform(df) for df in dfs]
result:
In [107]: scaled[0]
Out[107]:
array([[1. , 0.9838584 , 0.91065751],
[0.07130264, 1. , 0.03848462],
[0.48381052, 0.66922958, 0.49341912]])
In [108]: scaled[1]
Out[108]:
array([[0.53340314, 0.8729412 , 0.62360548],
[0. , 0.39480025, 1. ],
[0.04767918, 0.10412712, 0.95859434]])
In [109]: scaled[2]
Out[109]:
array([[0.55734177, 0.2195048 , 0. ],
[0.37519322, 0.10544644, 0.16862177],
[0.87201883, 0. , 0.94260309]])