I was working on UCI heart disease, and changed all the measurable values into z scores, and I want replace the values which are greater than 3 or smaller than -3 with 3 and 3 respectively or with mean.
My sample code is:
import pandas as pd
import numpy as np
df= pd.DataFrame({'X': np.random.randn(10),'Y':np.random.randn(10)})
df=df.append(pd.DataFrame({'X':np.array([3,-3,3.3,4]), 'Y':np.array([-3.4,2,1,5])}),ignore_index=True)
df['X'].apply(lambda x: x=3 if x>3 else (x = -3 if x<-3 else x))
But I'm receiving the following error:
File "<ipython-input-144-8d678556d1e7>", line 1
df['X'].apply(lambda x: x=3 if x>3 else (x= -3 if x<-3 else x))
^
SyntaxError: invalid syntax
How can I fix it?
The lambda
syntax is such that after x:
, you just state the function value, without repeating the x
(except for the conditions in this case).
df['X'].apply(lambda x: 3 if x > 3 else (-3 if x < -3 else x))