python scipy interpolation curve-fitting extrapolation

Curve fit or interpolation in a semilogy plot using scipy

I have very few data points and I want to create a line to best fit the data points when plotted in a semilogy scale. I have tried curve-fit and cubic interpolation from scipy, but none of them seems to be very reasonable to me compared to the data trend.

I would kindly ask you to check if there is a more efficient way to create a straight line fit for the data. Probably extrapolation can do, but I did not find a good documentation on extrapolation on python.

your help is very appreciated

import sys
import os
import numpy
import matplotlib.pyplot as plt
from pylab import *
from scipy.optimize import curve_fit
import scipy.optimize as optimization
from scipy.interpolate import interp1d
from scipy import interpolate

Mass500 = numpy.array([ 13.938 , 13.816,  13.661,  13.683,  13.621,  13.547,  13.477,  13.492,  13.237,
  13.232,  13.07,   13.048,  12.945,  12.861,  12.827,  12.577,  12.518])

y500 = numpy.array([  7.65103978e-06,   4.79865790e-06,   2.08218909e-05,   4.98385924e-06,
   5.63462673e-06,   2.90785458e-06,   2.21166794e-05,   1.34501705e-06,
   6.26021870e-07,   6.62368879e-07,   6.46735547e-07,   3.68589447e-07,
   3.86209019e-07,   5.61293275e-07,   2.41428755e-07,   9.62491134e-08,
   2.36892162e-07])



plt.semilogy(Mass500, y500, 'o')


# interpolation
f2 = interp1d(Mass500, y500, kind='cubic')
plt.semilogy(Mass500, f2(Mass500), '--')


# curve-fit
def line(x, a, b):
    return 10**(a*x+b)

#Initial guess.
x0     = numpy.array([1.e-6, 1.e-6])

print optimization.curve_fit(line, Mass500, y500, x0)
popt, pcov = curve_fit(line, Mass500, y500)
print popt
plt.semilogy(Mass500, line(Mass500, popt[0], popt[1]), 'r-')



plt.legend(['data', 'cubic', 'curve-fit'], loc='best')

show()

Solution

There are many regression functions available in numpy and scipy. scipy.stats.lingress is one of the simpler functions, and it returns common linear regression parameters.

Here are two options for fitting semi-log data:

Plot Transformed Data
Rescale Axes and Transform Input/Output Function Values

Given

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

%matplotlib inline


# Data
mass500 = np.array([ 
    13.938 , 13.816,  13.661,  13.683,
    13.621,  13.547,  13.477,  13.492,
    13.237,  13.232,  13.07,   13.048,  
    12.945,  12.861,  12.827,  12.577,  
    12.518
])

y500 = np.array([  
    7.65103978e-06,   4.79865790e-06,   2.08218909e-05,   4.98385924e-06,
    5.63462673e-06,   2.90785458e-06,   2.21166794e-05,   1.34501705e-06,
    6.26021870e-07,   6.62368879e-07,   6.46735547e-07,   3.68589447e-07,
    3.86209019e-07,   5.61293275e-07,   2.41428755e-07,   9.62491134e-08,
    2.36892162e-07
])

Code

Option 1: Plot Transformed Data

# Regression Function
def regress(x, y):
    """Return a tuple of predicted y values and parameters for linear regression."""
    p = sp.stats.linregress(x, y)
    b1, b0, r, p_val, stderr = p
    y_pred = sp.polyval([b1, b0], x)
    return y_pred, p

# Plotting
x, y = mass500, np.log(y500)                      # transformed data
y_pred, _ = regress(x, y)

plt.plot(x, y, "mo", label="Data")
plt.plot(x, y_pred, "k--", label="Pred.")
plt.xlabel("Mass500")
plt.ylabel("log y500")                            # label axis
plt.legend()

Output

A simple approach is to plot transformed data and label the appropriate log axes.

Option 2: Rescale Axes and Transform Input/Output Function Values

Code

x, y = mass500, y500                              # data, non-transformed
y_pred, _ = regress(x, np.log(y))                 # transformed input             

plt.plot(x, y, "o", label="Data")
plt.plot(x, np.exp(y_pred), "k--", label="Pred.") # transformed output
plt.xlabel("Mass500")
plt.ylabel("y500")
plt.semilogy()
plt.legend()

Output

A second option is to alter the axes to semi-log scales (via plt.semilogy()). Here the non-transformed data naturally appears linear. Also notice the labels represent the data as-is.

To make an accurate regression, all that remains is to transform data passed into the regression function (via np.log(x) or np.log10(x)) in order to return the proper regression parameters. This transformation is immediately reversed when plotting predicated values using a complementary operation, i.e. np.exp(x) or 10**x.