Search code examples
pandasmedianoutliersrolling-computation

Filtering out outliers in Pandas dataframe with rolling median


I am trying to filter out some outliers from a scatter plot of GPS elevation displacements with dates

I'm trying to use df.rolling to compute a median and standard deviation for each window and then remove the point if it is greater than 3 standard deviations.

However, I can't figure out a way to loop through the column and compare the the median value rolling calculated.

Here is the code I have so far

import pandas as pd
import numpy as np

def median_filter(df, window):
    cnt = 0
    median = df['b'].rolling(window).median()
    std = df['b'].rolling(window).std()
    for row in df.b:
      #compare each value to its median




df = pd.DataFrame(np.random.randint(0,100,size=(100,2)), columns = ['a', 'b'])

median_filter(df, 10)

How can I loop through and compare each point and remove it?


Solution

  • Just filter the dataframe

    df['median']= df['b'].rolling(window).median()
    df['std'] = df['b'].rolling(window).std()
    
    #filter setup
    df = df[(df.b <= df['median']+3*df['std']) & (df.b >= df['median']-3*df['std'])]