python numpy math linear-algebra polynomials

numpy fit coefficients to linear combination of polynomials

I have data that I want to fit with polynomials. I have 200,000 data points, so I want an efficient algorithm. I want to use the numpy.polynomial package so that I can try different families and degrees of polynomials. Is there some way I can formulate this as a system of equations like Ax=b? Is there a better way to solve this than with scipy.minimize?

import numpy as np
from scipy.optimize import minimize as mini

x1 = np.random.random(2000)
x2 = np.random.random(2000)
y = 20 * np.sin(x1) + x2 - np.sin (30 * x1 - x2 / 10)
def fitness(x, degree=5):

    poly1 = np.polynomial.polynomial.polyval(x1, x[:degree])
    poly2 = np.polynomial.polynomial.polyval(x2, x[degree:])
    return np.sum((y - (poly1 + poly2)) ** 2 )

# It seems like I should be able to solve this as a system of equations
# x = np.linalg.solve(np.concatenate([x1, x2]), y)

# minimize the sum of the squared residuals to find the optimal polynomial coefficients
x = mini(fitness, np.ones(10))
print fitness(x.x)

Solution

Your intuition is right. You can solve this as a system of equations of the form Ax = b.

However:

The system is overdefined and you want to get the least-squares solution, so you need to use np.linalg.lstsq instead of np.linalg.solve.
~~You can't use polyval because you need to separate the coefficients and powers of the independent variable.~~

This is how to construct the system of equations and solve it:

A = np.stack([x1**0, x1**1, x1**2, x1**3, x1**4, x2**0, x2**1, x2**2, x2**3, x2**4]).T             
xx = np.linalg.lstsq(A, y)[0]
print(fitness(xx))  # test the result with original fitness function

Of course you can generalize over the degree:

A = np.stack([x1**p for p in range(degree)] + [x2**p for p in range(degree)]).T

With the example data, the least squares solution runs much faster than the minimize solution (800µs vs 35ms on my laptop). However, A can become quite large, so if memory is an issue minimize might still be an option.

Update:

Without any knowledge about the internals of the polynomial function things become tricky, but it is possible to separate terms and coefficients. Here is a somewhat ugly way to construct the system matrix A from a function like polyval:

def construct_A(valfunc, degree):
    columns1 = []
    columns2 = []
    for p in range(degree):
        c = np.zeros(degree)
        c[p] = 1
        columns1.append(valfunc(x1, c))
        columns2.append(valfunc(x2, c))
    return np.stack(columns1 + columns2).T

A = construct_A(np.polynomial.polynomial.polyval, 5)           
xx = np.linalg.lstsq(A, y)[0]
print(fitness(xx))  # test the result with original fitness function