Search code examples
rdatetimetime-serieskalman-filterforecast

Imputed predictions for missing time-series data nearly stationary (flat line)


I have player over time data that is missing player counts over several years. I'm trying to fill in/predict the missing player count data over different intervals.

Data available here: https://1drv.ms/u/s!AvEZ_QPY7OZuhJAlKJN89rH185SUhA

I'm following the instructions below that use KalmanRun to impute the missing values. I've tried 3 different approaches to transforming the data- using an xts object, and 2 approaches to converting it into time series data

https://stats.stackexchange.com/questions/104565/how-to-use-auto-arima-to-impute-missing-values

require(forecast)
library(xts)
library(anytime)
library(DescTools)

df_temp = read.csv("r_share.csv")
df_temp[['DateTime']] <- as.Date(strptime(df_temp[['DateTime']], format='%Y-%m-%d %H:%M:%S'))

3 approaches to convert data; xts seems to work best by returning non-zero data that is interpretable.

#Convert df_temp to TimeSeries object

df_temp = xts(df_temp$Players, df_temp$DateTime)
#df_temp = as.ts(log(df_temp$Players), start = start(df_temp$DateTime), end = end(df_temp$DateTime), frequency = 365)
#df_temp = ts(df_temp$Players, start = c(2013, 02, 02), end = c(2016, 01, 31), frequency = 365)

Fitting and plotting:

fit <- auto.arima(df_temp, seasonal = TRUE)
id.na <- which(is.na(df_temp))

kr <- KalmanRun(index(df_temp), fit$model, update = FALSE)

#?KalmanRun$tol

for (i in id.na)
  df_temp[i] <- fit$model$Z %*% kr$states[i,]

plot(df_temp)

The expected output is data that mimics the variability seen in the actual data and is different for each interval, whereas the actual output is relatively stationary and unchanging (both intervals have nearly the same prediction).


Solution

  • It needs to be with model arima()?.
    Maybe you could try with another model, developed by Facebook named Prophet.
    Here you can find the guide and github page.

    If I understood you want something like this:

    # Import library
    
    library(prophet)
    
    # Read  data
    df = read.csv("C:/Users/Downloads/r_share.csv",sep = ";")
    
    # Transform to date
    df["DateTime"] = as.Date(df$DateTime,format = "%d/%m/%Y")
    
    # Change names for the model
    colnames(df) = c("ds","y")
    
    # call model
    m = prophet(df)
    
    
    # make "future" just one day greater than past
    future = make_future_dataframe(m,periods = 1)
    
    # predict the points
    forecast = predict(m,future)
    
    # plot results
    plot(m,forecast)
    

    plot