Search code examples
pythonscikit-learnregressionlinear-regressionransac

How to approximate the linear part of a curve with the RANSACRegressor


I am trying to approximate the linear increase of a curve I got from an experiment and after a bit of research thought to try and utilise the RANSAC Regressor of the SKlearn package.

The orange circles are my dataset, I have drawn in the black line manually, this is what I want to estimate with the RANSACRegressor and the blue line is what I get from the RANSACRegressor.

Dataset (orange circles), the linear part I want estimation of (black line), what I get with the RANSACRegressor

from sklearn.linear_model import RANSACRegressor, LinearRegression
import matplotlib.pyplot as plt
import numpy as np

t=np.linspace(0,200,41)

data=np.array([0.        , 0.0096128 , 0.02310166, 0.04523012, 0.08117545,
       0.13629904, 0.21510879, 0.31956002, 0.44718081, 0.58977918,
       0.73352921, 0.8609058 , 0.95425321, 1.        , 0.99212502,
       0.93359348, 0.83530922, 0.7129898 , 0.58321985, 0.45996872,
       0.35254691, 0.26521134, 0.19806381, 0.14856588, 0.11304068,
       0.08778646, 0.0696888 , 0.05641965, 0.04637975, 0.03853317,
       0.03223059, 0.02706633, 0.02277986, 0.01919466, 0.01618332,
       0.01364839, 0.01151211, 0.00971083, 0.00819165, 0.00691022,
       0.00582927])

#datacorrection
data_c=(data/np.exp(-0.000567*60*t)/np.max((data/np.exp(-0.000567*60*t))))

#"aim" to only get linear increase, as there are several linear parts of the curve
treg=t[0:100]
datareg=data[0:100]

reg=RANSACRegressor(estimator=LinearRegression(),random_state=0, residual_threshold=0.01, min_samples=30).fit(treg.reshape(-1, 1),datareg)
linear_part=reg.predict(t.reshape(-1, 1))

plt.plot(t, data_c,marker='o', label="Decay corrected data", linestyle='none', color='darkorange',markerfacecolor='none')
plt.plot(t,linear_part)
plt.ylim([-0.1,1.1])
plt.xlim([-9,209])


plt.xlabel("Time [min]")
plt.ylabel("Activity [a.u.]")

That is an adapted snipped of the code I used. I tried changing the residual_treshhold, as well as the min_samples, but am never able to catch the increase I actually wanna catch. I think I am doing something fundamentally wrong, but I am not sure what exactly.


Solution

  • I found my mistake, I had to stack the time (x) and the datapoints (y) in an array and then model it, since the RANSAC regressor expects a point cloud.