I'm trying to calculate the following for a set of data to learn some time series analysis and then block boot strap the standard errors for individuals :
Here's the data set :
https://www.dropbox.com/s/z066lnxetz9uaf6/health.csv?dl=0
And here is the code I've done for the Cor :
#Check for duplicates
health.d <- health.d[!duplicated(health.d),]
health.d$lnincome <- log(health.d$Income + 1)
health.d <- health.d[(health.d$sex == 1 & health.d$married == 0),]
#First Difference for each individual ( %>% , group_by and mutate are functions in dplyr package)
health.d <- health.d %>%
group_by(ID) %>%
mutate(Dy = lnincome - lag(lnincome, 1))
#Remove NA from Dy
health.d <- health.d[!is.na(health.d$Dy),]
#Autocorretion
health.d <- arrange(health.d, ID, year)
health.d <- transform(health.d, time = as.numeric(interaction(ID, drop=TRUE)))
health.d$lag1DY <- health.d$lnincome - lag(health.d$lnincome, 1)
health.d$lagDY_s1 <- lag(health.d$lnincome,1) - lag(health.d$lnincome, 2)
health.d$lagDY_s2 <- lag(health.d$lnincome,2) - lag(health.d$lnincome, 3)
health.d$lagDY_s3 <- lag(health.d$lnincome,3) - lag(health.d$lnincome, 4)
health.d$lagDY_s4 <- lag(health.d$lnincome,4) - lag(health.d$lnincome, 5)
#Remove NA from lag
health.d <- health.d[!is.na(health.d$lag1DY),]
health.d <- health.d[!is.na(health.d$lagDY_s1),]
health.d <- health.d[!is.na(health.d$lagDY_s2),]
health.d <- health.d[!is.na(health.d$lagDY_s3),]
health.d <- health.d[!is.na(health.d$lagDY_s4),]
cor(health.d$lag1DY, health.d$lagDY_s1)
cor(health.d$lag1DY, health.d$lagDY_s2)
cor(health.d$lag1DY, health.d$lagDY_s3)
cor(health.d$lag1DY, health.d$lagDY_s4)
Results :
> cor(health.d$lag1DY, health.d$lagDY_s1)
[1] -0.05593212
> cor(health.d$lag1DY, health.d$lagDY_s2)
[1] -0.1033625
> cor(health.d$lag1DY, health.d$lagDY_s3)
[1] -0.0804236
> cor(health.d$lag1DY, health.d$lagDY_s4)
[1] -0.1235624
These seem wrong as there should be high correlation between the time periods due to the income, but I can't figure out what I have done wrong.
Edit: I've updated my code to include the current results I've reached. These don't appear to be correct, but (1) I don't know the correct numbers, and (2) I don't know where my code is wrong. I'm posting my current results in hope someone can correct me :)
Any help with a block bootstrap on the standard errors?
Thanks in advance.
Probably all what you need is to use acf
function in stats
package. It will do correlations for many lags as you prefer.
library(stats) # for the use of "acf" function
health.d <- health.d[!duplicated(health.d),]
health.d$lnincome <- log(health.d$Income + 1)
health.d <- health.d[(health.d$sex == 1 & health.d$married == 0),]
#First Difference for each individual ( %>% , group_by and mutate are functions in dplyr package)
health.d <- health.d %>%
group_by(ID) %>%
mutate(Dy = lnincome - lag(lnincome, 1))
acf.results<-acf(health.d$Dy, lag.max = 5, type = "correlation",plot = TRUE, na.action = na.pass)
plot(acf.results, main="Auto-correlation")
This will give you the following plot of auto-corrections at 5 lags specified in the acf
argument
If you want to access the acf results you can use:
print(acf.results)
and you will get the following
Autocorrelations of series ‘health.d$Dy’, by lag
0 1 2 3 4 5
1.000 -0.225 0.016 -0.030 -0.002 0.002