Search code examples
pythonnumpyreplacemedianimputation

replacing value with median in python


lat
50.63757782
50.6375742
50.6375742
50.6374077762
50.63757782
50.6374077762
50.63757782
50.63757782

I have plotted a graph with these latitude values and noticed that there was sudden spike in the graph (outlier). I want to replace every lat value with median of last three values so that I can see a meaningful result

The output might be

lat               lat_med
50.63757782 50.63757782
50.6375742  50.6375742
50.6375742  50.6375742
50.63740778 50.6375742
50.63757782 50.6375742
50.63740778 50.6375742
50.63757782 50.6375742
50.63757782 50.6375742

I have thousands of such lat values and need to solve this using a for loop. I know that the following code has errors and since I am a beginner in python, I appreciate your help in solving this.

for i in range(0,len(df['lat'])):
    df['lat_med'][i]=numpy.median(numpy.array(df['lat'][i],df['lat'][i-2]))

I just realized that median calculation for three points is not serving my purpose and I need to consider five values. is there a way to change the median function for as many as values I want. Thank you for your help

def median(a, b, c):
    if a > b and a > c:
        return b if b > c else c

    if a < b and a < c:
        return b if b < c else c

    return a

Solution

  • Just go thought second to second to last elements and put save the median out of this, previous and next element. Note that first and last elements are left as they were.

    Try this:

    lat = [50.63757782, 50.6375742, 50.6375742, 50.6374077762, 50.63757782, 50.6374077762, 50.63757782, 50.63757782]
    
    # returns median value out of the three values
    def median(a, b, c):
        if a > b and a > c:
            return b if b > c else c
    
        if a < b and a < c:
            return b if b < c else c
    
        return a
    
    
    # add the first element
    filtered = [lat[0]]
    
    for i in range(1, len(lat) - 1):
        filtered += [median(lat[i - 1], lat[i], lat[i + 1])]
    
    # add the last element
    filtered += [lat[-1]]
    
    print(filtered)
    

    What you are doing is a very basic Median filter