Search code examples
pythonnumpyscipydata-fitting

Which library/function should I use to fit a multivariate polynom to my data?


I have data that depends on 4 independent variables (x1,x2,x3,x4) and I need a model (available in Python) to evaluate f(x1,x2,x3,x4) outside the data points. In principle, if I set 3 of my variables as constant values I can always use a polynomial fit of a reasonable degree (<5) to interpolate the data in the remaining dimension so I would like to generate a function that is capable to interpolate in all dimensions at once using a multivariate polynomial fit. It must be noted that the underlying function is non-linear (meaning that I should expect terms of the form x1^n*x2^m where n,m are not 0). What do you recommend?

To illustrate I am including a small sample of data:

(Note that the fact that some variables appear to be constant is due to the fact that this is just a small sample)

x1  x2  x3  x4  f
15  10  5   3   0.621646
15  10  5   5   0.488879
15  10  5   10  0.490204
15  10  7   0   0.616027
15  10  7   0.5 0.615497
15  10  7   1   0.619804
15  10  7   3   0.614494
15  10  7   5   0.556772
15  10  7   10  0.555393
15  20  0.5 0   0.764692
15  20  0.5 0.5 0.78774
15  20  0.5 1   0.799749
15  20  0.5 3   0.567796
15  20  0.5 5   0.328497
15  20  0.5 10  0.0923708
15  20  1   0   0.802219
15  20  1   0.5 0.811475
15  20  1   1   0.822908
15  20  1   3   0.721053
15  20  1   5   0.573549
15  20  1   10  0.206259
15  20  2   0   0.829069
15  20  2   0.5 0.831135
15  0   7   1   0.240144
15  0   7   3   0.258186
15  0   7   5   0.260836

Solution

  • You can do multivariate curve fitting use the scipy.optimize.curve_fit() function. It is well documented and there are multiple questions and answers on StackOverflow on using it for multivariate fitting.

    For your case, something like this can help you start off

    import numpy
    from scipy.optimize import curve_fit
    
    # Example function to fit to your data
    def non_linear_func(x, a, b, c, d):
        return x[0] ** a * x[1] ** b + x[2] ** c + x[3] ** d 
    
    # X is your multivariate x data
    # f is your y data
    
    # p0 is an initial guess for your a,b,c,d... in your fitting function
    p0 = [1,2,3,4]
    
    fitParams, fitCov = curve_fit(non_linear_func, X, y, p0=p0)
    

    A couple of things to note, you need to make sure that the X and y you pass to curve_fit() have the correct dimensions. X must have dimensions of N x M, where N is the number of data points you have, and M is the number of independent variables you have. y should be of length N.

    You must also define your fitting function based on the form that you would like, and try and give an initial guess, p0, for the parameters in the function to help curve_fit find the optimal values.

    Hope that helps, there are lots of good answers on multivariate fitting with curve_fit() on StackOverflow (see here and here) and the curve_fit documentation should be of help as well.