Search code examples
pythonscipyinterpolationcurve-fittingextrapolation

Curve fit or interpolation in a semilogy plot using scipy


I have very few data points and I want to create a line to best fit the data points when plotted in a semilogy scale. I have tried curve-fit and cubic interpolation from scipy, but none of them seems to be very reasonable to me compared to the data trend.

I would kindly ask you to check if there is a more efficient way to create a straight line fit for the data. Probably extrapolation can do, but I did not find a good documentation on extrapolation on python.

your help is very appreciated

import sys
import os
import numpy
import matplotlib.pyplot as plt
from pylab import *
from scipy.optimize import curve_fit
import scipy.optimize as optimization
from scipy.interpolate import interp1d
from scipy import interpolate

enter image description here

Mass500 = numpy.array([ 13.938 , 13.816,  13.661,  13.683,  13.621,  13.547,  13.477,  13.492,  13.237,
  13.232,  13.07,   13.048,  12.945,  12.861,  12.827,  12.577,  12.518])

y500 = numpy.array([  7.65103978e-06,   4.79865790e-06,   2.08218909e-05,   4.98385924e-06,
   5.63462673e-06,   2.90785458e-06,   2.21166794e-05,   1.34501705e-06,
   6.26021870e-07,   6.62368879e-07,   6.46735547e-07,   3.68589447e-07,
   3.86209019e-07,   5.61293275e-07,   2.41428755e-07,   9.62491134e-08,
   2.36892162e-07])



plt.semilogy(Mass500, y500, 'o')


# interpolation
f2 = interp1d(Mass500, y500, kind='cubic')
plt.semilogy(Mass500, f2(Mass500), '--')


# curve-fit
def line(x, a, b):
    return 10**(a*x+b)

#Initial guess.
x0     = numpy.array([1.e-6, 1.e-6])

print optimization.curve_fit(line, Mass500, y500, x0)
popt, pcov = curve_fit(line, Mass500, y500)
print popt
plt.semilogy(Mass500, line(Mass500, popt[0], popt[1]), 'r-')



plt.legend(['data', 'cubic', 'curve-fit'], loc='best')

show()

Solution

  • There are many regression functions available in numpy and scipy. scipy.stats.lingress is one of the simpler functions, and it returns common linear regression parameters.

    Here are two options for fitting semi-log data:

    1. Plot Transformed Data
    2. Rescale Axes and Transform Input/Output Function Values

    Given

    import numpy as np
    import scipy as sp
    import matplotlib.pyplot as plt
    
    %matplotlib inline
    
    
    # Data
    mass500 = np.array([ 
        13.938 , 13.816,  13.661,  13.683,
        13.621,  13.547,  13.477,  13.492,
        13.237,  13.232,  13.07,   13.048,  
        12.945,  12.861,  12.827,  12.577,  
        12.518
    ])
    
    y500 = np.array([  
        7.65103978e-06,   4.79865790e-06,   2.08218909e-05,   4.98385924e-06,
        5.63462673e-06,   2.90785458e-06,   2.21166794e-05,   1.34501705e-06,
        6.26021870e-07,   6.62368879e-07,   6.46735547e-07,   3.68589447e-07,
        3.86209019e-07,   5.61293275e-07,   2.41428755e-07,   9.62491134e-08,
        2.36892162e-07
    ])
    

    Code

    Option 1: Plot Transformed Data

    # Regression Function
    def regress(x, y):
        """Return a tuple of predicted y values and parameters for linear regression."""
        p = sp.stats.linregress(x, y)
        b1, b0, r, p_val, stderr = p
        y_pred = sp.polyval([b1, b0], x)
        return y_pred, p
    
    # Plotting
    x, y = mass500, np.log(y500)                      # transformed data
    y_pred, _ = regress(x, y)
    
    plt.plot(x, y, "mo", label="Data")
    plt.plot(x, y_pred, "k--", label="Pred.")
    plt.xlabel("Mass500")
    plt.ylabel("log y500")                            # label axis
    plt.legend()
    

    Output

    enter image description here

    A simple approach is to plot transformed data and label the appropriate log axes.


    Option 2: Rescale Axes and Transform Input/Output Function Values

    Code

    x, y = mass500, y500                              # data, non-transformed
    y_pred, _ = regress(x, np.log(y))                 # transformed input             
    
    plt.plot(x, y, "o", label="Data")
    plt.plot(x, np.exp(y_pred), "k--", label="Pred.") # transformed output
    plt.xlabel("Mass500")
    plt.ylabel("y500")
    plt.semilogy()
    plt.legend()
    

    Output

    enter image description here

    A second option is to alter the axes to semi-log scales (via plt.semilogy()). Here the non-transformed data naturally appears linear. Also notice the labels represent the data as-is.

    To make an accurate regression, all that remains is to transform data passed into the regression function (via np.log(x) or np.log10(x)) in order to return the proper regression parameters. This transformation is immediately reversed when plotting predicated values using a complementary operation, i.e. np.exp(x) or 10**x.