(reproducible example added)
I tried to model Yt=0.6Yt-1+Vt (Vt ~ N(0,1)) AR(1) with three different techniques.
1. The formula Yt =ρt[y1/ρ + ∑j=2 to t (α+Vj)/ρj] of Yt=α+ρYt-1+Vt obtained via back substitution and using cumulative sum.
2. The classical for
loop
3. arima.sim
simulation.
While I expect to get more or less the same things, interesting and unexpected things happened:
N <- 1388; ro <- 0.6; a <- 0
set.seed(1)
v <- ts(rnorm(N,0,1))
# Formula technique
y1 <- ts(rep(0,N)) # y[1]:=0 defined
y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))
# The classical "for" loop
y2 <- ts(rep(0,N)) # y2[1]:=0 defined
for (t in 2:N){ y2[t] <- a + ro*y2[t-1]+v[t] }
# arima.sim simulation
set.seed(1)
y3 <- arima.sim(model=list(ar=0.6), n=1388, mean=0, sd=sqrt(1))
# change n in arima.sim accordingly such that n=N simultaneously with the N definition above
c(mean(y1),sd(y1))
c(mean(y2),sd(y2))
c(mean(y3),sd(y3))
N=1388 (and n=1388) gives:
[1] -0.03713488 1.26102077
[1] -0.03713488 1.26102077
[1] -0.01048798 1.28445899
N=1389 (and n=1389) gives:
[1] Inf NaN
[1] -0.03661779 1.26071373
[1] -0.01048798 1.28445899
The standard deviation values are just as expected:
StdDev(Yt)=sqrt[sd(v)2/(1-ρ2)]
sqrt(sd(v)^2/(1-(0.6)^2)) # 1.284229
Strangenesses appear for the formula technique:
c(mean(y1),sd(y1))
is defined for N<=1388
c(mean(y1),sd(y1))
is (Inf, NaN) for N=1389 and N=1390
c(mean(y1),sd(y1))
is (NaN, NA) for N>=1391.
Question:
1. Why do the formula technique fail for N>=1989?
(though we have a stationary series with ρ=0.6)
2. Why do we get c(mean(y1),sd(y1))#(Inf, NaN)
for N=1389 and N=1390 and
c(mean(y1),sd(y1))#(NaN, NA)
for N>=1391?
With N <- 1389; ro <- 0.6; a <- 0
the last element of cumsum((a+v[-1]) / ro^(2:N))
is larger than the maximally representable number on your PC i.e. your .Machine$double.xmax
. (I am assuming here it is maximally 1.797693e+308
. Thus it is represented as Inf
. This causes the last element of y1 to become an Inf
as well as any legal arithmetic operation on Inf
and a numeric yields Inf
. sd
of a vector conatining an Inf
is a NaN
:
N <- 1389; ro <- 0.6; a <- 0
set.seed(1)
v <- ts(rnorm(N,0,1))
# Formula technique
y1 <- ts(rep(0,N)) # y[1]:=0 defined
y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))
tail((a+v[-1]) / ro^(2:N))
#[1] -8.183997e+306 1.049540e+307 -1.581977e+306 -3.189985e+307 -7.577105e+307
#[6] Inf
tail(cumsum((a+v[-1]) / ro^(2:N)))
#[1] -1.814422e+307 -7.648815e+306 -9.230793e+306 -4.113064e+307 -1.169017e+308
#[6] Inf
tail(y1)
#[1] -1.6598499 -0.4198324 -0.3039989 -0.8127365 -1.3859780 Inf
N <- 1390; ro <- 0.6; a <- 0
. The last value of (a+v[-1]) / ro^(2:N)
is representable and thus nothing changes.
N <- 1390; ro <- 0.6; a <- 0
set.seed(1)
v <- ts(rnorm(N,0,1))
# Formula technique
y1 <- ts(rep(0,N)) # y[1]:=0 defined
y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))
tail((a+v[-1]) / ro^(2:N))
#[1] 1.049540e+307 -1.581977e+306 -3.189985e+307 -7.577105e+307 Inf
#[6] -5.546904e+307
tail(cumsum((a+v[-1]) / ro^(2:N)))
#[1] -7.648815e+306 -9.230793e+306 -4.113064e+307 -1.169017e+308 Inf
#[6] Inf
tail(y1)
#[1] -0.4198324 -0.3039989 -0.8127365 -1.3859780 Inf Inf
N <- 1391; ro <- 0.6; a <- 0
the last value of (a+v[-1]) / ro^(2:N)
becomes -Inf
(smaller than .Machine$double.xmin
) and the cumsum has to add Inf
and -Inf
, which gives a NaN
, which means not a number. The mean
of NaN
s is not a number as well and the sd(NaN)
is NA
.
N <- 1391; ro <- 0.6; a <- 0
set.seed(1)
v <- ts(rnorm(N,0,1))
# Formula technique
y1 <- ts(rep(0,N)) # y[1]:=0 defined
y1[-1] <- ro^(2:N) * (y1[1]/ro + cumsum((a+v[-1]) / ro^(2:N)))
tail((a+v[-1]) / ro^(2:N))
#[1] -1.581977e+306 -3.189985e+307 -7.577105e+307 Inf -5.546904e+307
#[6] -Inf
tail(cumsum((a+v[-1]) / ro^(2:N)))
#[1] -9.230793e+306 -4.113064e+307 -1.169017e+308 Inf Inf
#[6] NaN
tail(y1)
#[1] -0.3039989 -0.8127365 -1.3859780 Inf Inf NaN