Search code examples
pandasseriesshapespreprocessor

How can I apply MinMaxScaler in my dataset? group by NUM(ID) and for each columns


Num  Year  1   2   3   4   5   6   7   8   9   10  11  12
111  2009  0   0   667 584 343 245 0   0   231 742 0   482
111  2010  0   10  624 4   143 445 20  991 0   742 0   482
112  2008 .....

Dataset is like this. Num, Year : Multiindex in Dataframe(pandas)

And desired output is scaled data of columns from 1 to 12 using group by Num. So I can compare the Usage(for each months) of Each year only with same 'Num'.

print(df_pivot.groupby('Num')[1].apply(lambda x: MinMaxScaler().fit_transform(x.values.reshape(-1,1))))
print(df_pivot.groupby('Num')[1].apply(lambda x: MinMaxScaler().fit_transform(x.values.reshape(-1,1))).loc[11111])
 ########
First Output is : 
11111    [[0.0], [0.0], [0.8012012012012012], [0.690210...
11112    [[0.5682020471740098], [0.5516243880729862], [...
11115    [[0.0], [0.0], [0.9999999999999999], [0.747932...
-----------------
Second Output is :
[[0.        ]
 [0.97836918]
 [0.91226151]
 [1.        ]
 [0.68939118]
 [0.51707376]
 [0.50893183]]

If look at the First Output, there is no Year index but only the Num index. So the result is returned in form of pandas.Series but this is not the desired output.

Can somebody help to get the data like??

11111   2010   0.0  0.3  0.4 1.0  0.5 ....
11111   2011   0.8  ....
11111   2012   0.2  ....
....

Solution

  • Instead of: df_pivot.groupby('Num')[1] you have to write: df_pivot.groupby(['Num','Year'])

    For example:

    print(df_pivot.groupby(['Num','Year']).apply(lambda x: MinMaxScaler().fit_transform(x.values.reshape(-1,1)))))