I have data that depends on 4 independent variables (x1,x2,x3,x4) and I need a model (available in Python) to evaluate f(x1,x2,x3,x4) outside the data points. In principle, if I set 3 of my variables as constant values I can always use a polynomial fit of a reasonable degree (<5) to interpolate the data in the remaining dimension so I would like to generate a function that is capable to interpolate in all dimensions at once using a multivariate polynomial fit. It must be noted that the underlying function is non-linear (meaning that I should expect terms of the form x1^n*x2^m where n,m are not 0). What do you recommend?
To illustrate I am including a small sample of data:
(Note that the fact that some variables appear to be constant is due to the fact that this is just a small sample)
x1 x2 x3 x4 f
15 10 5 3 0.621646
15 10 5 5 0.488879
15 10 5 10 0.490204
15 10 7 0 0.616027
15 10 7 0.5 0.615497
15 10 7 1 0.619804
15 10 7 3 0.614494
15 10 7 5 0.556772
15 10 7 10 0.555393
15 20 0.5 0 0.764692
15 20 0.5 0.5 0.78774
15 20 0.5 1 0.799749
15 20 0.5 3 0.567796
15 20 0.5 5 0.328497
15 20 0.5 10 0.0923708
15 20 1 0 0.802219
15 20 1 0.5 0.811475
15 20 1 1 0.822908
15 20 1 3 0.721053
15 20 1 5 0.573549
15 20 1 10 0.206259
15 20 2 0 0.829069
15 20 2 0.5 0.831135
15 0 7 1 0.240144
15 0 7 3 0.258186
15 0 7 5 0.260836
You can do multivariate curve fitting use the scipy.optimize.curve_fit()
function. It is well documented and there are multiple questions and answers on StackOverflow on using it for multivariate fitting.
For your case, something like this can help you start off
import numpy
from scipy.optimize import curve_fit
# Example function to fit to your data
def non_linear_func(x, a, b, c, d):
return x[0] ** a * x[1] ** b + x[2] ** c + x[3] ** d
# X is your multivariate x data
# f is your y data
# p0 is an initial guess for your a,b,c,d... in your fitting function
p0 = [1,2,3,4]
fitParams, fitCov = curve_fit(non_linear_func, X, y, p0=p0)
A couple of things to note, you need to make sure that the X
and y
you pass to curve_fit()
have the correct dimensions. X
must have dimensions of N x M, where N is the number of data points you have, and M is the number of independent variables you have. y
should be of length N.
You must also define your fitting function based on the form that you would like, and try and give an initial guess, p0
, for the parameters in the function to help curve_fit
find the optimal values.
Hope that helps, there are lots of good answers on multivariate fitting with curve_fit()
on StackOverflow (see here and here) and the curve_fit documentation should be of help as well.