Search code examples
pythondata-miningoutliersdata-preprocessing

removing outliers from numerical features


hi i'm trying to remove outliers from columns with numerical features but when i execute my code the whole dataset is removed can any1 tell me what im doing wrong please

numerical_columns = data.select_dtypes(include=['int64','float64']).columns.tolist()

print('Number of rows before discarding outlier = %d' % (data.shape[0]))

for i in numerical_columns:

q1 = data[i].quantile(0.25)
q3 = data[i].quantile(0.75)
iqr = q3-q1 #Interquartile range
fence_low  = q1-1.5*iqr
fence_high = q3+1.5*iqr
data = data.loc[(data[i] > fence_low) & (data[i] < fence_high)]

print('Number of rows after discarding outlier = %d' % (data.shape[0]))

Solution

  • The below code has worked for me. Here col is the numerical column of dataframe for which you need to remove outliers

        #Remove Outliers: keep only the ones that are within +3 to -3 
        # standard deviations in the column   
         df = df[np.abs(df[col]-df[col].mean()) <= (3*df[col].std())]