Search code examples
time-serieshierarchical-clusteringhierarchical-bayesianmultivariate-time-seriesnumpyro

How to predict time series with limited data


I have a dataset with four columns: date, category, product, rate(%). I would like to be able to forecast the rate for every product in my data. The major issue I'm having is that because products constantly come in an out of production, certain products have very little historical data making predictions difficult. I've read online that people with similar issues have used bayesian hierarchical models, like this example from Numpyro:

import numpyro
from numpyro.infer import MCMC, NUTS, Predictive
import numpyro.distributions as dist
from jax import random
def model(PatientID, Weeks, FVC_obs=None):
    μ_α = numpyro.sample("μ_α", dist.Normal(0., 100.))
    σ_α = numpyro.sample("σ_α", dist.HalfNormal(100.))
    μ_β = numpyro.sample("μ_β", dist.Normal(0., 100.))
    σ_β = numpyro.sample("σ_β", dist.HalfNormal(100.))
    
    unique_patient_IDs = np.unique(PatientID)
    n_patients = len(unique_patient_IDs)
    
    with numpyro.plate("plate_i", n_patients):
        α = numpyro.sample("α", dist.Normal(μ_α, σ_α))
        β = numpyro.sample("β", dist.Normal(μ_β, σ_β))
    
    σ = numpyro.sample("σ", dist.HalfNormal(100.))
    FVC_est = α[PatientID] + β[PatientID] * Weeks
    
    with numpyro.plate("data", len(PatientID)):
        numpyro.sample("obs", dist.Normal(FVC_est, σ), obs=FVC_obs)

But every example I've found online has only shown code examples of linear regression being used within the hierarchical model. Is it possible to use hierarchical models to predict for data that is non-linear? Does anyone have experience with using hierarchical models, specifically for time series data?


Solution

  • I think you are looking for a simulation, which you can do based on statistics.

    You could "randomize" the produced data using a mean rate +- a variance between the mean minus the max value. Never done this, but i think it's doable. I would try the machine learning way to be honest.

    Anyways, it will not be representative of the reality that's why everyone uses linear regression as "reference" and not a prediction as such. Kind of "the results should be around this value". This is, talking from a business perspective. If what you need is more data, then i would look for a simulation.