Search code examples
python-3.xpandasnormalizationdistribution

How can I split my normalization in two according to column values?


HI I have a column data in pandas with a hugely skewed distribution: data distribution

I split the data in two according to a cutoff value of 1000 and this is the distribution of the two groups. enter image description here

Now, I want to normalize with values between 0-1. I want to perform a 'differential' normalization, in a way that the left panel values are normalized between 0-0.5 and the right panel is normalized to 0.5 to 1, everything in the same column. How can I do it?


Solution

  • It's not pretty, but works.

    df = pd.DataFrame({'dataExample': [0,1,2,1001,1002,1003]})
    
    less1000 = df.loc[df['dataExample'] <= 1000]
    df.loc[df['dataExample'] <= 1000, 'datanorm'] =  less1000['dataExample'] / (less1000['dataExample'].max() * 2)
    
    high1000 = df.loc[df['dataExample'] > 1000]
    df.loc[df['dataExample'] > 1000, 'datanorm'] =  ((high1000['dataExample'] - high1000['dataExample'].min()) / ((high1000['dataExample'].max() - high1000['dataExample'].min()) * 2) + 0.5)
    
    output:
        dataExample datanorm
    0   0   0.00
    1   1   0.25
    2   2   0.50
    3   1001    0.50
    4   1002    0.75
    5   1003    1.00