Rescaling to (0,1) certain columns from Pandas Python dataframe

I have the following type of dataframe:

  Channel   Region  Fresh   Milk    Grocery Frozen  Detergents_Paper    Delicassen
0   2         3     12669   9656    7561    214        2674             1338
1   2         3     7057    9810    9568    1762       3293             1776
2   2         3     6353    8808    7684    2405       3516             7844
3   1         3     13265   1196    4221    6404       507              1788
4   2         3     22615   5410    7198    3915       1777             5185

I would like to do two things:

1) Be able to rescale only certain columns and not all of them in order for them to be between 0,1. I would like to select only certain columns but not by their name but by their position. Imagine I want to change 200 and don't want to write all of them.

The code I tried was:

df /= df.max()

But it makes all of the columns to be between (0,1) and not only the ones I want. And I can't find a way to select a part of them only.

2) I would also like to re scale the columns but not between them, what I mean is I would like to make a scale only for milk and another one only for frozen, for instance.

I want to re scale each one, for example divide between 100 because they are too big, but maybe for another column I would like to divide it between 10 cause 100 is too much. How would I do that?

Solution

For 1, you can select a list of columns like this:

df[['Milk','Frozen','Grocery']]

Therefore, to rescale only those three columns, use:

df[['Milk','Frozen','Grocery']] -= df[['Milk','Frozen','Grocery']].min()
df[['Milk','Frozen','Grocery']] /= df[['Milk','Frozen','Grocery']].max()

This method already scales your column independantly from each other if this is what your second question means.

EDIT:

If you want to select the 200 first columns of your dataframe, you can use df.columns which gives you the list of your columns:

df[df.columns[:200]] -= df[df.columns[:200]].min()
df[df.columns[:200]] /= df[df.columns[:200]].max()

the max method on pandas on a dataframe returns a list of the max of each column. Therefore if you use the above code, you'll have max values in each of the columns exactly equal to 1.

If you don't want to divide it by the max of each column but first column by n1, second column by n2 you can use the same notation:

df[df.columns[:4]] /= [n1,n2,n3,n4]