I have player over time data that is missing player counts over several years. I'm trying to fill in/predict the missing player count data over different intervals.
Data available here: https://1drv.ms/u/s!AvEZ_QPY7OZuhJAlKJN89rH185SUhA
I'm following the instructions below that use KalmanRun to impute the missing values. I've tried 3 different approaches to transforming the data- using an xts object, and 2 approaches to converting it into time series data
https://stats.stackexchange.com/questions/104565/how-to-use-auto-arima-to-impute-missing-values
require(forecast)
library(xts)
library(anytime)
library(DescTools)
df_temp = read.csv("r_share.csv")
df_temp[['DateTime']] <- as.Date(strptime(df_temp[['DateTime']], format='%Y-%m-%d %H:%M:%S'))
3 approaches to convert data; xts seems to work best by returning non-zero data that is interpretable.
#Convert df_temp to TimeSeries object
df_temp = xts(df_temp$Players, df_temp$DateTime)
#df_temp = as.ts(log(df_temp$Players), start = start(df_temp$DateTime), end = end(df_temp$DateTime), frequency = 365)
#df_temp = ts(df_temp$Players, start = c(2013, 02, 02), end = c(2016, 01, 31), frequency = 365)
Fitting and plotting:
fit <- auto.arima(df_temp, seasonal = TRUE)
id.na <- which(is.na(df_temp))
kr <- KalmanRun(index(df_temp), fit$model, update = FALSE)
#?KalmanRun$tol
for (i in id.na)
df_temp[i] <- fit$model$Z %*% kr$states[i,]
plot(df_temp)
The expected output is data that mimics the variability seen in the actual data and is different for each interval, whereas the actual output is relatively stationary and unchanging (both intervals have nearly the same prediction).
It needs to be with model arima()
?.
Maybe you could try with another model, developed by Facebook named Prophet.
Here you can find the guide and github page.
If I understood you want something like this:
# Import library
library(prophet)
# Read data
df = read.csv("C:/Users/Downloads/r_share.csv",sep = ";")
# Transform to date
df["DateTime"] = as.Date(df$DateTime,format = "%d/%m/%Y")
# Change names for the model
colnames(df) = c("ds","y")
# call model
m = prophet(df)
# make "future" just one day greater than past
future = make_future_dataframe(m,periods = 1)
# predict the points
forecast = predict(m,future)
# plot results
plot(m,forecast)