MinMax scaler on list of dataframes

I have list of dataframes (all the dataframes has identical numeric columns ,represent different results of the same test. I want to keep them separated ). I want to training scikit MinMaxScalar that will take into account the minimum and maximum values for each column from all dataframes. May someone have solution to that?

Thanks,

MAK

Solution

You want to do the following:

create a temporary DataFrame tmp as a concatenation of all your DFs from the list
fit the MinMaxScaler object on tmp DF
scale (transform) all DFs in the list using fitted MinMaxScaler object

UPDATE:

May you have a suggestion for training without creating temp dataframe?

we can make use of the .partial_fit() method in order to fit data from all DFs iteratively:

creating a list of sample DFs:

In [100]: dfs = [pd.DataFrame(np.random.rand(3,3)*100 - 50) for _ in range(3)]

In [101]: dfs[0]
Out[101]:
           0          1          2
0  45.473162  42.366712  41.395652
1 -35.476703  43.777850 -36.363200
2   0.479528  14.861075   4.196630

In [102]: dfs[2]
Out[102]:
           0          1          2
0   6.888876 -24.454986 -39.794309
1  -8.988094 -34.426252 -24.760782
2  34.317689 -43.644643  44.243769

scaling:

In [103]: from sklearn.preprocessing import MinMaxScaler

In [104]: mms = MinMaxScaler()

In [105]: _ = [mms.partial_fit(df) for df in dfs]

In [106]: scaled = [mms.transform(df) for df in dfs]

result:

In [107]: scaled[0]
Out[107]:
array([[1.        , 0.9838584 , 0.91065751],
       [0.07130264, 1.        , 0.03848462],
       [0.48381052, 0.66922958, 0.49341912]])

In [108]: scaled[1]
Out[108]:
array([[0.53340314, 0.8729412 , 0.62360548],
       [0.        , 0.39480025, 1.        ],
       [0.04767918, 0.10412712, 0.95859434]])

In [109]: scaled[2]
Out[109]:
array([[0.55734177, 0.2195048 , 0.        ],
       [0.37519322, 0.10544644, 0.16862177],
       [0.87201883, 0.        , 0.94260309]])