Search code examples
rtime-seriesforecasting

What is the computational complexity of arima() function in R?


If my time series has n members and I want to fit ARIMA(1,0,1) model, what will be the complexity in big O notation?

In the following example I need to know the complexity of second line of code:

series <- arima.sim(list(order=c(1,0,1), ar=.9, ma=-.5), n=1000)
result <- arima(series, order=c(1,0,1))

Thanks.


Solution

  • It is O(n) complexity. A story for this? See below.

    As I said in the comment, we can measure it by a regression model. As a toy demonstration, consider the following procedure for data collection and regression.

    We first define a function to measure model fitting time of an ARMA(1,1) model (or ARIMA(1,0,1)). (Note that I am using the basic system.time() here. You may consider using microbenchmark() from package microbenchmark. But in the following, I will use reasonably large n, to reduce the sensitivity in time measuring.)

    t_arma <- function(N) {
      series <- arima.sim(list(order=c(1,0,1), ar=.9, ma=-.5), n=N)
      as.numeric((system.time(arima(series, order=c(1,0,1)))))[3]
      }
    

    We need to collect, say 100, data. We try 100 increasingly large n, and measure the model fitting time t.

    k <- 100; t <- numeric(k)
    n <- seq(20000, by = 1000, length = k)  ## start from n = 20000 (big enough)
    for (i in 1:k) t[i] <- t_arma(n[i])
    

    Now, if we assume the complexity is: a * (n ^ b), or O(n ^ b), we can estimate a, b by a regression model:

    log(t) ~ log(a) + b * log(n)
    

    and we are particularly interested in the slop estimate: b.

    So let's call lm()

    fit <- lm(log(t) ~ log(n))
    
    #Coefficients:
    #(Intercept)       log(n)  
    #    -9.2185       0.8646  
    

    We can also sketch the scatter plot of log(n) v.s. log(t), as well as the fitted line:

    plot(log(n), log(t), main = "scatter plot")
    lines(log(n), fit$fitted, col = 2, lwd = 2)
    

    fitted model

    There are some outliers in the beginning, hence the slope is lower than it should be. We now consider removing outliers and refit model for robustness.

    Outliers are featured with large residuals. Let's have a look:

    plot(fit$resi, main = "residuals")
    

    residuals

    We can mark and remove those outliers. Looks like 0.5 is a good enough threshold to filter those outliers.

    exclude <- fit$resi > 0.5
    n <- n[!exclude]
    t <- t[!exclude]
    

    Now we refit the linear model and make plot:

    robust_fit <- lm(log(t) ~ log(n))
    
    #Coefficients:
    #(Intercept)       log(n)  
    #    -11.197        1.039  
    
    plot(log(n), log(t), main = "scatter plot (outliers removed)")
    lines(log(n), robust_fit$fitted, col = 2, lwd = 2)
    

    enter image description here

    Wow, great, we are golden! The slope estimate is 1. So the O(n) complexity is justified!