Search code examples
pythonnumpymathpolynomials

How can I use multiple dimensional polynomials with numpy.polynomial?


I'm able to use numpy.polynomial to fit terms to 1D polynomials like f(x) = 1 + x + x^2. How can I fit multidimensional polynomials, like f(x,y) = 1 + x + x^2 + y + yx + y x^2 + y^2 + y^2 x + y^2 x^2? It looks like numpy doesn't support multidimensional polynomials at all: is that the case? In my real application, I have 5 dimensions of input and I am interested in hermite polynomials. It looks like the polynomials in scipy.special are also only available for one dimension of inputs.

# One dimension of data can be fit
x = np.random.random(100)
y = np.sin(x)
params = np.polynomial.polynomial.polyfit(x, y, 6)
np.polynomial.polynomial.polyval([0, .2, .5, 1.5], params)

array([ -5.01799432e-08,   1.98669317e-01,   4.79425535e-01,
         9.97606096e-01])

# When I try two dimensions, it fails. 
x = np.random.random((100, 2))
y = np.sin(5 * x[:,0]) + .4 * np.sin(x[:,1])
params = np.polynomial.polynomial.polyvander2d(x, y, [6, 6])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-5409f9a3e632> in <module>()
----> 1 params = np.polynomial.polynomial.polyvander2d(x, y, [6, 6])

/usr/local/lib/python2.7/site-packages/numpy/polynomial/polynomial.pyc in polyvander2d(x, y, deg)
   1201         raise ValueError("degrees must be non-negative integers")
   1202     degx, degy = ideg
-> 1203     x, y = np.array((x, y), copy=0) + 0.0
   1204 
   1205     vx = polyvander(x, degx)

ValueError: could not broadcast input array from shape (100,2) into shape (100)

Solution

  • It doesn't look like polyfit supports fitting multivariate polynomials, but you can do it by hand, with linalg.lstsq. The steps are as follows:

    1. Gather the degrees of monomials x**i * y**j you wish to use in the model. Think carefully about it: your current model already has 9 parameters, if you are going to push to 5 variables then with the current approach you'll end up with 3**5 = 243 parameters, a sure road to overfitting. Maybe limit to the monomials of __total_ degree at most 2 or three...

    2. Plug the x-points into each monomial; this gives a 1D array. Stack all such arrays as columns of a matrix.

    3. Solve a linear system with aforementioned matrix and with the right-hand side being the target values (I call them z because y is confusing when you also use x, y for two variables).

    Here it is:

    import numpy as np
    x = np.random.random((100, 2))
    z = np.sin(5 * x[:,0]) + .4 * np.sin(x[:,1])
    degrees = [(i, j) for i in range(3) for j in range(3)]  # list of monomials x**i * y**j to use
    matrix = np.stack([np.prod(x**d, axis=1) for d in degrees], axis=-1)   # stack monomials like columns
    coeff = np.linalg.lstsq(matrix, z)[0]    # lstsq returns some additional info we ignore
    print("Coefficients", coeff)    # in the same order as the monomials listed in "degrees"
    fit = np.dot(matrix, coeff)
    print("Fitted values", fit)
    print("Original values", y)