I have the following type of dataframe:
Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicassen
0 2 3 12669 9656 7561 214 2674 1338
1 2 3 7057 9810 9568 1762 3293 1776
2 2 3 6353 8808 7684 2405 3516 7844
3 1 3 13265 1196 4221 6404 507 1788
4 2 3 22615 5410 7198 3915 1777 5185
I would like to do two things:
1) Be able to rescale only certain columns and not all of them in order for them to be between 0,1. I would like to select only certain columns but not by their name but by their position. Imagine I want to change 200 and don't want to write all of them.
The code I tried was:
df /= df.max()
But it makes all of the columns to be between (0,1) and not only the ones I want. And I can't find a way to select a part of them only.
2) I would also like to re scale the columns but not between them, what I mean is I would like to make a scale only for milk and another one only for frozen, for instance.
I want to re scale each one, for example divide between 100 because they are too big, but maybe for another column I would like to divide it between 10 cause 100 is too much. How would I do that?
For 1, you can select a list of columns like this:
df[['Milk','Frozen','Grocery']]
Therefore, to rescale only those three columns, use:
df[['Milk','Frozen','Grocery']] -= df[['Milk','Frozen','Grocery']].min()
df[['Milk','Frozen','Grocery']] /= df[['Milk','Frozen','Grocery']].max()
This method already scales your column independantly from each other if this is what your second question means.
EDIT:
If you want to select the 200 first columns of your dataframe, you can use df.columns
which gives you the list of your columns:
df[df.columns[:200]] -= df[df.columns[:200]].min()
df[df.columns[:200]] /= df[df.columns[:200]].max()
the max
method on pandas on a dataframe returns a list of the max of each column. Therefore if you use the above code, you'll have max values in each of the columns exactly equal to 1.
If you don't want to divide it by the max of each column but first column by n1
, second column by n2
you can use the same notation:
df[df.columns[:4]] /= [n1,n2,n3,n4]