Search code examples
pythondataframenormalization

Normalizing a dataframe


I want to normalize my dataframe but normalization should be done every 16 rows where I have 16*1550 rows and 17 columns. I implemented that using the following code and it is giving a warning. Is this the correct way to do it?

for n in range(1550):

     data_features[16*n:16*(n+1)][:] = (data_features[16*n:16*(n+1)][:] - data_features[16*n:16*(n+1)][:].mean())/data_features[16*n:16*(n+1)][:].std()

Solution

  • The way you access the dataframe is wrong. To modify cells you must always use loc or iloc (or, if relevant, at and iat) and NEVER select rows from a column. And if you want to normalize by blocs, you should process rows by blocs. So a simple fix could be:

    for n in range(1550):
        data_features.iloc[16*n:16*(n+1)] = (
            data_features.iloc[16*n:16*(n+1)]
            - data_features.iloc[16*n:16*(n+1)].mean()
            )/data_features.iloc[16*n:16*(n+1)].std()