Search code examples
rtime-seriesregressionnon-linear-regression

ARIMA model with nonlinear exogenous variable in R


I'm doing a non-linear regression in R and want to add one moving-average term to my model to eliminate the autocorrelations in residuals.

Basically, here is the model:

y[n] = a + log((x1[n])^g + (x2[n])^g) + c*e[n-1] + e[n]

where [e] is the moving average term.

I plan to use ARIMA(0, 0, 1) to model residuals. However, I do not know which function I should use in R to add non-linear exogenous part to ARIMA model.

More information: I know how to use nls command to estimate a and g, but do not know how to deal with e[n].

I know that xreg in arima can handle ARIMA model with linear exogenous variables. Is there a similar function to handle ARIMA model with nonlinear exogenous variables?

Thank you for the help in advance!


Solution

  • nlme has such capability, as it is fitting non-linear mixed models. You can think of it an extension to nls (a fixed-effect only non-linear regression), by allowing random effect and correlated errors.

    nlme can handle ARMA correlation, by something like correlation = corARMA(0.2, ~ 1, p = 0, q = 1, fixed = TRUE). This means, that residuals are MA(1) process, with initial guess of coefficient 0.2, but to be updated during model fitting. The ~ 1 suggests that MA(1) is on intercept and there is no further grouping structure.

    I am not an expert in nlme, but I know nlme is what you need. I produce the following example, but since I am not an expert, I can't get nlme work at the moment. I post it here to give a start / flavour.

    set.seed(0)
    x1 <- runif(100)
    x2 <- runif(100)
    ## MA(1) correlated error, with innovation standard deviation 0.1
    e <- arima.sim(model = list(ma = 0.5), n = 100, sd = 0.1)
    ## a true model, with `a = 0.2, g = 0.5`
    y0 <- 0.2 + log(x1 ^ 0.5 + x2 ^ 0.5)
    ## observations
    y <- y0 + e
    
    ## no need to install; it comes with R; just `library()` it
    library(nlme)
    
    fit <- nlme(y ~ a + log(x1 ^ g + x2 ^ g), fixed = a + g ~ 1,
                start = list(a = 0.5, g = 1),
                correlation = corARMA(0.2, form = ~ 1, p = 0, q = 1, fixed = FALSE))
    

    Similar to nls, we have an overall model formula y ~ a + log(x1 ^ g + x2 ^ g), and starting values are required for iteration process. I have chosen start = list(a = 0.5, g = 1). The correlation bit has been explained in the beginning.

    fixed and random arguments in nlme specify what should be seen as fixed effects and random effects in the overall formula. Since we have no random effect, we leave it unspecified. We want a and g as fixed effect, so I tried something like fixed = a + g ~ 1. Unfortunately it does not quite work, for some reason I don't know. I read the ?nlme, and thought this formula means that we want a common a and g for all observations, but later nlme reports an error saying this is not a valid group formula.

    I am also investing at this; as I said, the above gives us a start. We are already fairly close to the final answer.


    Thanks to user20650 for point out my awkward error. I should use gnls function rather than nlme. By design nature of nlme package, functions lme and nlme have to take a random argument to work. Luckily, there are several other routines in nlme package for extending linear models and non-linear models.

    • gls and gnls extend lm and nls by allowing non-diagonal variance functions.

    So, I should really use gnls instead:

    ## no `fixed` argument as `gnls` is a fixed-effect only
    fit <- gnls(y ~ a + log(x1 ^ g + x2 ^ g), start = list(a = 0.5, g = 1),
                correlation = corARMA(0.2, form = ~ 1, p = 0, q = 1, fixed = FALSE))
    
    #Generalized nonlinear least squares fit
    #  Model: y ~ a + log(x1^g + x2^g) 
    #  Data: NULL 
    #  Log-likelihood: 92.44078
    #
    #Coefficients:
    #        a         g 
    #0.1915396 0.5007640 
    #
    #Correlation Structure: ARMA(0,1)
    # Formula: ~1 
    # Parameter estimate(s):
    #   Theta1 
    #0.4184961 
    #Degrees of freedom: 100 total; 98 residual
    #Residual standard error: 0.1050295