Search code examples
python-3.xpandasdata-cleaning

How to remove a outliers with z-scores (3 or -3) using apply function


I was working on UCI heart disease, and changed all the measurable values into z scores, and I want replace the values which are greater than 3 or smaller than -3 with 3 and 3 respectively or with mean.

My sample code is:

import pandas as pd
import numpy as np
df= pd.DataFrame({'X': np.random.randn(10),'Y':np.random.randn(10)}) 
df=df.append(pd.DataFrame({'X':np.array([3,-3,3.3,4]),                                   'Y':np.array([-3.4,2,1,5])}),ignore_index=True) 
df['X'].apply(lambda x: x=3 if x>3 else (x = -3 if x<-3 else x))

But I'm receiving the following error:

File "<ipython-input-144-8d678556d1e7>", line 1
    df['X'].apply(lambda x: x=3 if x>3 else (x= -3 if x<-3 else x))
                                              ^
SyntaxError: invalid syntax

How can I fix it?


Solution

  • The lambda syntax is such that after x:, you just state the function value, without repeating the x (except for the conditions in this case).

    df['X'].apply(lambda x: 3 if x > 3 else (-3 if x < -3 else x))