Search code examples
pythonpandasdataframestatistics

How to replace dataframe values based on index statistics


I have a dataframe like this:

l1 = [1,2,3,4,5,6,7,8,9,10]
l2 = [11,12,13,14,15,16,17,18,19,20]
index = ['FORD','GM']
df = pd.DataFrame(l1,l2).reset_index().T
df.index = index

I want to replace these integer values based on this:

For every index, if the value is less than mean-2, then it is 'MINI' else 'MEGA'.

Here the mean varies for every row.

The desired out looks like this:

0 1 2 3 4 5 6 7 8 9
FORD MINI MINI MINI MEGA MEGA MEGA MEGA MEGA MEGA MEGA
GM MINI MINI MINI MEGA MEGA MEGA MEGA MEGA MEGA MEGA

Can anyone help me with this?


Solution

  • You can compare each row to its mean - 2 and if it is ≤ (lt) assign the MINI/MEGA values with numpy.where:

    out = df.copy()
    
    out[:] = np.where(df.lt(df.mean(axis=1).sub(2), axis=0), 'MINI', 'MEGA')
    

    Output:

             0     1     2     3     4     5     6     7     8     9
    FORD  MINI  MINI  MINI  MEGA  MEGA  MEGA  MEGA  MEGA  MEGA  MEGA
    GM    MINI  MINI  MINI  MEGA  MEGA  MEGA  MEGA  MEGA  MEGA  MEGA