Search code examples
pythonseabornnon-linear-regression

Non-linear regression in Seaborn Python


I have the following dataframe that I wish to perform some regression on. I am using Seaborn but can't quite seem to find a non-linear function that fits. Below is my code and it's output, and below that is the dataframe I am using, df. Note I have truncated the axis in this plot.

I would like to fit either a Poisson or Gaussian distribution style of function.

import pandas 
import seaborn


graph = seaborn.lmplot('$R$', 'Equilibrium Value', data = df, fit_reg=True, order=2, ci=None)
graph.set(xlim = (-0.25,10))

However this produces the following figure.

enter image description here

df

     R          Equilibrium Value
0   5.102041    7.849315e-03
1   4.081633    2.593005e-02
2   0.000000    9.990000e-01
3   30.612245   4.197446e-14
4   14.285714   6.730133e-07
5   12.244898   5.268202e-06
6   15.306122   2.403316e-07
7   39.795918   3.292955e-18
8   19.387755   3.875505e-09
9   45.918367   5.731842e-21
10  1.020408    9.936863e-01
11  50.000000   8.102142e-23
12  2.040816    7.647420e-01
13  48.979592   2.353931e-22
14  43.877551   4.787156e-20
15  34.693878   6.357120e-16
16  27.551020   9.610208e-13
17  29.591837   1.193193e-13
18  31.632653   1.474959e-14
19  3.061224    1.200807e-01
20  23.469388   6.153965e-11
21  33.673469   1.815181e-15
22  42.857143   1.381050e-19
23  25.510204   7.706746e-12
24  13.265306   1.883431e-06
25  9.183673    1.154141e-04
26  41.836735   3.979575e-19
27  36.734694   7.770915e-17
28  18.367347   1.089037e-08
29  44.897959   1.657448e-20
30  16.326531   8.575577e-08
31  28.571429   3.388120e-13
32  40.816327   1.145412e-18
33  11.224490   1.473268e-05
34  24.489796   2.178927e-11
35  21.428571   4.893541e-10
36  32.653061   5.177167e-15
37  8.163265    3.241799e-04
38  22.448980   1.736254e-10
39  46.938776   1.979881e-21
40  47.959184   6.830820e-22
41  26.530612   2.722925e-12
42  38.775510   9.456077e-18
43  6.122449    2.632851e-03
44  37.755102   2.712309e-17
45  10.204082   4.121137e-05
46  35.714286   2.223883e-16
47  20.408163   1.377819e-09
48  17.346939   3.057373e-08
49  7.142857    9.167507e-04

EDIT

Attached are two graphs produced from both this and another data set when increasing the order parameter beyond 20.

enter image description here

enter image description here

Order = 3 enter image description here


Solution

  • I have problems understanding why a lmplot is needed here. Usually you want to perform a fit by taking a model function and fit it to the data. Assume you want a gaussian function

    model = lambda x, A, x0, sigma, offset:  offset+A*np.exp(-((x-x0)/sigma)**2)
    

    you can fit it to your data with scipy.optimize.curve_fit:

    popt, pcov = curve_fit(model, df["R"].values, 
                                  df["EquilibriumValue"].values, p0=[1,0,2,0])
    

    Complete code:

    import pandas as pd
    import numpy as np
    from scipy.optimize import curve_fit
    import matplotlib.pyplot as plt
    
    df = ... # your dataframe
    
    # plot data
    plt.scatter(df["R"].values,df["EquilibriumValue"].values, label="data")
    
    # Fitting
    model = lambda x, A, x0, sigma, offset:  offset+A*np.exp(-((x-x0)/sigma)**2)
    popt, pcov = curve_fit(model, df["R"].values, 
                                  df["EquilibriumValue"].values, p0=[1,0,2,0])
    #plot fit
    x = np.linspace(df["R"].values.min(),df["R"].values.max(),250)
    plt.plot(x,model(x,*popt), label="fit")
    
    # Fitting
    model2 = lambda x, sigma:  model(x,1,0,sigma,0)
    popt2, pcov2 = curve_fit(model2, df["R"].values, 
                                  df["EquilibriumValue"].values, p0=[2])
    #plot fit2
    x2 = np.linspace(df["R"].values.min(),df["R"].values.max(),250)
    plt.plot(x2,model2(x2,*popt2), label="fit2")
    
    plt.xlim(None,10)
    plt.legend()
    plt.show()
    

    enter image description here