Search code examples
pythonpandasleast-squaresstatsmodels

Solving linearised least squares using statsmodels


I'm trying to translate a simple linearised least squares problem to statsmodels, in order to learn how to use it for iterative least squares:

The (contrived) data comprise measurements of the time it takes for a ball to drop a given distance.

distance    time
10          1.430
20          2.035
30          2.460
40          2.855

Using these measurements, I want to determine the acceleration due to gravity, using:

t = sqrt(2s/g)

This is (obviously) non-linear, but I can linearise it (F(x- + 𝛿x) = l0 + v, where x- is a provisional value), then use a provisional value for g (10) to calculate F(g), and iterate if necessary:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

measurements = pd.DataFrame({
    'distance': [10, 20, 30, 40],
    'time': [1.430, 2.035, 2.460, 2.855]
})

prov_g = 10
measurements['fg'] = measurements['distance'].apply(
    lambda d: ((2 * d) ** 0.5) * (prov_g ** -0.5))
measurements['A_matrix'] = measurements['distance'].apply(
    lambda d: -np.sqrt(d / 2) * (prov_g ** -1.5))
measurements['b'] = measurements['time'] - measurements['fg']
ATA = np.dot(measurements['A_matrix'], measurements['A_matrix'].T)
ATb = np.dot(measurements['A_matrix'].T, measurements['b'])
x = np.dot(ATA ** -1, ATb)
updated_g = prov_g + x
updated_g

>>> 9.807

What I can't figure out from the examples is how I can use solve statsmodels to do what I've just done manually (linearising the problem, then solving using matrix multiplication)


Solution

  • statsmodels is not directly of any help here, at least not yet.

    I think your linearized non-linear least square optimization is essentially what scipy.optimize.leastsq does internally. It has several more user friendly or extended wrappers, for example scipy.optimize.curve_fit or the lmfit package.

    Statsmodels currently does not have a generic version of an equivalent iterative solver.

    Statsmodels uses iteratively reweighted least squares as optimizer in several models like GLM and RLM. However, those are model specific implementations. In those cases statsmodels uses WLS (weighted least square) to calculate the equivalent of your solution for the linear model in calculating the next step.