HI I have a column data in pandas with a hugely skewed distribution:
I split the data in two according to a cutoff value of 1000 and this is the distribution of the two groups.
Now, I want to normalize with values between 0-1. I want to perform a 'differential' normalization, in a way that the left panel values are normalized between 0-0.5 and the right panel is normalized to 0.5 to 1, everything in the same column. How can I do it?
It's not pretty, but works.
df = pd.DataFrame({'dataExample': [0,1,2,1001,1002,1003]})
less1000 = df.loc[df['dataExample'] <= 1000]
df.loc[df['dataExample'] <= 1000, 'datanorm'] = less1000['dataExample'] / (less1000['dataExample'].max() * 2)
high1000 = df.loc[df['dataExample'] > 1000]
df.loc[df['dataExample'] > 1000, 'datanorm'] = ((high1000['dataExample'] - high1000['dataExample'].min()) / ((high1000['dataExample'].max() - high1000['dataExample'].min()) * 2) + 0.5)
output:
dataExample datanorm
0 0 0.00
1 1 0.25
2 2 0.50
3 1001 0.50
4 1002 0.75
5 1003 1.00