Search code examples
pythonnumpyvectorizationpolynomials

how to vectorize forecasting when using polynomials in numpy


essentially would like to find the most efficient solution (numpy) that essentially allows me to extend np.poly1d to K dimensions.

test case:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


class Polyfit:

    @staticmethod
    def from_fit_to_forecast(df, forecast_values, dates_forward, x_data, y_data, order=2):
        # nice vectorized params estimation
        all_params = np.polyfit(x_data, y_data, order)

        # terrible fit of data as I loop over them
        new_df = pd.DataFrame([np.poly1d(i)(x_data) for i in all_params.T], columns=df.index, index=df.columns).T
        forecast_df_second = pd.DataFrame(
            [np.poly1d(i)(forecast_values) for i in all_params.T], columns=dates_forward, index=df.columns).T

        return new_df, forecast_df_second

    @staticmethod
    def gen_data(k_steps):
        data = 1 + np.random.rand(100, 4) / 300 - (np.random.rand(100, 4) / 10) ** 2
        dates = pd.date_range('2010-1-1', freq='D', periods=100)
        dates_forward = pd.date_range(max(dates) + pd.Timedelta(1, unit='d'), freq='D', periods=k_steps)
        return pd.DataFrame(data, columns=list('ABCD'), index=dates).cumprod(), dates_forward

    def __init__(self, k_steps_forward=20):
        self.original_data, dates_forward = self.gen_data(k_steps_forward)

        x_data = list(range(len(self.original_data.index)))
        max_x_data = max(x_data)
        forecast_values = list(range(max_x_data + 1, max_x_data + 1 + k_steps_forward, 1))
        y_data = self.original_data.values

        self.fit_df_2, self.forecast_2 = self.from_fit_to_forecast(
            self.original_data, forecast_values, dates_forward, x_data, y_data, order=2)

cls = Polyfit(k_steps_forward=30)

print(cls.fit_df_2)
print(cls.forecast_2)

the critical point is in the from_fit_to_forecast where I do this:

[np.poly1d(i)(forecast_values) for i in all_params.T]

which slows things down considerably. Also, since I would also be using the 2nd order polynomial, I tried playing around with np.dot or other things that work with matrixes but no avail.

any suggestions?


Solution

  • So you got a bunch of polynomial coefficients from

    all_params = np.polyfit(x_data, y_data, order)
    

    (where y_data is a 2D array) and you want to evaluate all of them at the points x_data. A vectorized way to do this, as explained below, is:

    (x_data.reshape(-1, 1)**np.arange(order, -1, -1)).dot(all_params)
    

    Here is a small example where the fit is perfect (2nd degree polys through three points), so you can see that the evaluation is correct

    x_data = np.array([1, 2, 3])
    y_data = np.array([[5, 6,], [9, 8], [7, 4]])
    order = 2
    all_params = np.polyfit(x_data, y_data, order)
    (x_data.reshape(-1, 1)**np.arange(order, -1, -1)).dot(all_params)
    

    outputs

    array([[ 5.,  6.],
           [ 9.,  8.],
           [ 7.,  4.]])
    

    Explanation

    x_data.reshape(-1, 1)**np.arange(order, -1, -1) creates a matrix of powers of x_data points, starting from highest, e.g.,

    x1**2 x1**1 x1**0
    x2**2 x2**1 x2**0
    

    This matrix gets multiplied, by way of matrix multiplication, with the coefficients of quadratic ax**2 + bx + c, which looks like

    a1 a2
    b1 b2
    c1 c2
    

    The result is exactly the values of polynomials.