Search code examples
rfor-loopformulamodelingautoregressive-models

Problems with modeling AR(1) with three different techniques in R: 1. the formula 2.for-loop 3.arima.sim


(reproducible example added)
I tried to model Yt=0.6Yt-1+Vt (Vt ~ N(0,1)) AR(1) with three different techniques.

1. The formula Ytt[y1/ρ + ∑j=2 to t (α+Vj)/ρj] of Yt=α+ρYt-1+Vt obtained via back substitution and using cumulative sum.

2. The classical for loop

3. arima.sim simulation.

While I expect to get more or less the same things, interesting and unexpected things happened:

N <- 1388; ro <- 0.6; a <- 0
set.seed(1)
v <- ts(rnorm(N,0,1))

# Formula technique
y1 <- ts(rep(0,N))  # y[1]:=0 defined
y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))

# The classical "for" loop
y2 <- ts(rep(0,N))  # y2[1]:=0 defined    
for (t in 2:N){  y2[t] <- a + ro*y2[t-1]+v[t] }

# arima.sim simulation 
set.seed(1) 
y3 <- arima.sim(model=list(ar=0.6), n=1388, mean=0, sd=sqrt(1)) 
# change n in arima.sim accordingly such that n=N simultaneously with the N definition above


c(mean(y1),sd(y1))
c(mean(y2),sd(y2))
c(mean(y3),sd(y3))

N=1388 (and n=1388) gives:

[1] -0.03713488  1.26102077
[1] -0.03713488  1.26102077
[1] -0.01048798  1.28445899

N=1389 (and n=1389) gives:

[1] Inf NaN
[1] -0.03661779  1.26071373
[1] -0.01048798  1.28445899

The standard deviation values are just as expected:
StdDev(Yt)=sqrt[sd(v)2/(1-ρ2)]

sqrt(sd(v)^2/(1-(0.6)^2)) # 1.284229

Strangenesses appear for the formula technique:

c(mean(y1),sd(y1)) is defined for N<=1388
c(mean(y1),sd(y1)) is (Inf, NaN) for N=1389 and N=1390
c(mean(y1),sd(y1)) is (NaN, NA) for N>=1391.

Question:
1. Why do the formula technique fail for N>=1989?
(though we have a stationary series with ρ=0.6)

2. Why do we get c(mean(y1),sd(y1))#(Inf, NaN) for N=1389 and N=1390 and c(mean(y1),sd(y1))#(NaN, NA) for N>=1391?


Solution

  • With N <- 1389; ro <- 0.6; a <- 0 the last element of cumsum((a+v[-1]) / ro^(2:N)) is larger than the maximally representable number on your PC i.e. your .Machine$double.xmax. (I am assuming here it is maximally 1.797693e+308. Thus it is represented as Inf. This causes the last element of y1 to become an Inf as well as any legal arithmetic operation on Inf and a numeric yields Inf. sd of a vector conatining an Inf is a NaN:

    N <- 1389; ro <- 0.6; a <- 0
    set.seed(1)
    v <- ts(rnorm(N,0,1))
    
    # Formula technique
    y1 <- ts(rep(0,N))  # y[1]:=0 defined
    y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))
    
    tail((a+v[-1]) / ro^(2:N))
    #[1] -8.183997e+306  1.049540e+307 -1.581977e+306 -3.189985e+307 -7.577105e+307
    #[6]            Inf
    tail(cumsum((a+v[-1]) / ro^(2:N)))
    #[1] -1.814422e+307 -7.648815e+306 -9.230793e+306 -4.113064e+307 -1.169017e+308
    #[6]            Inf
    tail(y1)
    #[1] -1.6598499 -0.4198324 -0.3039989 -0.8127365 -1.3859780        Inf 
    

    N <- 1390; ro <- 0.6; a <- 0. The last value of (a+v[-1]) / ro^(2:N) is representable and thus nothing changes.

    N <- 1390; ro <- 0.6; a <- 0
    set.seed(1)
    v <- ts(rnorm(N,0,1))
    
    # Formula technique
    y1 <- ts(rep(0,N))  # y[1]:=0 defined
    y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))
    
    tail((a+v[-1]) / ro^(2:N))
    #[1]  1.049540e+307 -1.581977e+306 -3.189985e+307 -7.577105e+307            Inf
    #[6] -5.546904e+307
    tail(cumsum((a+v[-1]) / ro^(2:N)))
    #[1] -7.648815e+306 -9.230793e+306 -4.113064e+307 -1.169017e+308            Inf
    #[6]            Inf
    tail(y1)
    #[1] -0.4198324 -0.3039989 -0.8127365 -1.3859780        Inf        Inf 
    
    

    N <- 1391; ro <- 0.6; a <- 0 the last value of (a+v[-1]) / ro^(2:N) becomes -Inf (smaller than .Machine$double.xmin) and the cumsum has to add Inf and -Inf, which gives a NaN, which means not a number. The mean of NaN s is not a number as well and the sd(NaN) is NA.

    N <- 1391; ro <- 0.6; a <- 0
    set.seed(1)
    v <- ts(rnorm(N,0,1))
    
    # Formula technique
    y1 <- ts(rep(0,N))  # y[1]:=0 defined
    y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))
    
    
    tail((a+v[-1]) / ro^(2:N))
    #[1] -1.581977e+306 -3.189985e+307 -7.577105e+307            Inf -5.546904e+307
    #[6]           -Inf
    tail(cumsum((a+v[-1]) / ro^(2:N)))
    #[1] -9.230793e+306 -4.113064e+307 -1.169017e+308            Inf            Inf
    #[6]            NaN
    tail(y1)
    #[1] -0.3039989 -0.8127365 -1.3859780        Inf        Inf        NaN