Search code examples

numpy: code to update least squares with more observations

I am looking for a numpy-based implementation of ordinary least squares that would allow the fit to be updated with more observations. Something along the lines of Applied Statistics algorithm AS 274 or R's biglm.

Failing that, a routine for updating a QR decomposition with new rows would also be of interest.

Any pointers?


  • scikits.statsmodels has an recursive OLS that updates the inverse X'X in the sandbox that could be used for this. (used only to calculate recursive OLS residuals.)

    Nathaniel Smith posted his code for OLS when the data is too large to fit in memory to the scipy-user mailing list. The main code updates X'X.

    I think econpy also has a function for this.

    Pandas has an expanding OLS, but it may not be easy to use in an online fashion.

    Nathaniels code might be the closest to biglm. I don't think there is anything for general linear model (error covariance different from identity).

    All need some work before they can be used for this. I don't know of any python(-wrapped) code that would update QR.

    update: see

    there is incremental qr and cholesky in cholmod available, but I didn't try it, either license or compilation on windows problems, and I don't think I tried to get incremental_qr to work see attachements