Search code examples
pythonnumpymathlinear-algebrapolynomials

numpy fit coefficients to linear combination of polynomials


I have data that I want to fit with polynomials. I have 200,000 data points, so I want an efficient algorithm. I want to use the numpy.polynomial package so that I can try different families and degrees of polynomials. Is there some way I can formulate this as a system of equations like Ax=b? Is there a better way to solve this than with scipy.minimize?

import numpy as np
from scipy.optimize import minimize as mini

x1 = np.random.random(2000)
x2 = np.random.random(2000)
y = 20 * np.sin(x1) + x2 - np.sin (30 * x1 - x2 / 10)
def fitness(x, degree=5):

    poly1 = np.polynomial.polynomial.polyval(x1, x[:degree])
    poly2 = np.polynomial.polynomial.polyval(x2, x[degree:])
    return np.sum((y - (poly1 + poly2)) ** 2 )

# It seems like I should be able to solve this as a system of equations
# x = np.linalg.solve(np.concatenate([x1, x2]), y)

# minimize the sum of the squared residuals to find the optimal polynomial coefficients
x = mini(fitness, np.ones(10))
print fitness(x.x)

Solution

  • Your intuition is right. You can solve this as a system of equations of the form Ax = b.

    However:

    • The system is overdefined and you want to get the least-squares solution, so you need to use np.linalg.lstsq instead of np.linalg.solve.

    • You can't use polyval because you need to separate the coefficients and powers of the independent variable.

    This is how to construct the system of equations and solve it:

    A = np.stack([x1**0, x1**1, x1**2, x1**3, x1**4, x2**0, x2**1, x2**2, x2**3, x2**4]).T             
    xx = np.linalg.lstsq(A, y)[0]
    print(fitness(xx))  # test the result with original fitness function
    

    Of course you can generalize over the degree:

    A = np.stack([x1**p for p in range(degree)] + [x2**p for p in range(degree)]).T
    

    With the example data, the least squares solution runs much faster than the minimize solution (800µs vs 35ms on my laptop). However, A can become quite large, so if memory is an issue minimize might still be an option.

    Update:

    Without any knowledge about the internals of the polynomial function things become tricky, but it is possible to separate terms and coefficients. Here is a somewhat ugly way to construct the system matrix A from a function like polyval:

    def construct_A(valfunc, degree):
        columns1 = []
        columns2 = []
        for p in range(degree):
            c = np.zeros(degree)
            c[p] = 1
            columns1.append(valfunc(x1, c))
            columns2.append(valfunc(x2, c))
        return np.stack(columns1 + columns2).T
    
    A = construct_A(np.polynomial.polynomial.polyval, 5)           
    xx = np.linalg.lstsq(A, y)[0]
    print(fitness(xx))  # test the result with original fitness function