Search code examples
python-2.7scipycurve-fittingoutliersbest-fit-curve

Outlier detection using recursive curve fitting and error elimination


Is there any way to do anomaly detection in dataset using recursive curve fitting and removing points having the most mean square error with respect to the curve, upto an acceptable threshold?

I am using the scipy.optimize.curve_fit function for python 2.7, and I need to work with python preferably.


Solution

  • You are most probably speaking about recursive regression (which is quite easy in Matlab). For python, try and use the scipy.optimize.curve_fit.

    For a simple 3 degree polynomial fit, this would work based on numpy.polyfit and poly1d.

    import numpy as np
    import matplotlib.pyplot as plt
    
    points = np.array([(1, 1), (2, 4), (3, 1), (9, 3)])
    # get x and y vectors
    x = points[:,0]
    y = points[:,1]
    
    # calculate polynomial
    z = np.polyfit(x, y, 3)
    f = np.poly1d(z)
    
    # calculate new x's and y's
    x_new = np.linspace(x[0], x[-1], 50)
    y_new = f(x_new)
    
    plt.plot(x,y,'o', x_new, y_new)
    plt.xlim([x[0]-1, x[-1] + 1 ])
    plt.show()