This is an example of my data:
id <- c(1,1,1,1,2,2,3,3,3,3,4,4,4)
Affect <- c(0.8, 0.5, NA, 0.8, 0.2, 0.1, 0.7, 1.1, 0.9, 0.5, 0.3, NA, 0.9)
Paranoia <- c(0.9, 0.6, 0.4, 0.2, 0.1, NA, 0.3, 0.1, 0.9, 1.5, 0.4, 0.1, 0.6)
both <- data.frame(id, Affect, Paranoia)
Now I calculate a cross correlation for each ID seperately, which gives me a list:
library(tseries)
library(dplyr)
library(tidyr)
out <- both %>%
group_by(id) %>%
filter(!(all(is.na(Affect))|all(is.na(Paranoia)))) %>%
mutate_at(vars(Affect, Paranoia), replace_na, 0) %>%
dplyr::summarise(ccfout = list(ccf(Affect, Paranoia, ylim=c(-10, 10), lag.max=5)))
What I want to do now is to find the lag at which the correlation is at its maximum and the correlation value at that point for each ID - tried this, but didn't work, probably because I have the list for each ID:
Find_Max_CCF <- function(Affect,Paranoia)
{
d<- both %>%
group_by(id) %>%
filter(!(all(is.na(Affect))|all(is.na(Paranoia)))) %>%
mutate_at(vars(Affect, Paranoia), replace_na, 0) %>%
dplyr::summarise(ccfout = list(ccf(Affect, Paranoia, ylim=c(-10, 10))))
cor = d$acf[,,1]
lag = d$lag[,,1]
res = data.frame(cor,lag)
res_max = res[which.max(res$cor),]
return(res_max)
}
Find_Max_CCF(both)
The error message is:
1: Unknown or uninitialised column: 'acf'.
2: Unknown or uninitialised column: 'lag'.
3: Unknown or uninitialised column: 'acf'.
4: Unknown or uninitialised column: 'lag'
Do you have any ideas? Thanks a lot in advance.
The problem is that the column ccfout
you create contains lists of acf
objects, whereas you want them to be dataframes to be able to slice the way you try to.
I wrote a function ccf_as_df
that instead returns list
s of data.frame
objects, with columns lag
and ccf
, by extracting those from the acf
object that ccf()
returns.
ccf_as_df <- function(x, y) {
# calculate ccf and return it as a list of a dataframe
# with columns `lag` and `acf`
ccf_obj <- ccf(x, y, ylim=c(-10, 10), lag.max=5, plot = F)
ccf_df <- data.frame(lag = as.vector(ccf_obj$lag), ccf = as.vector(ccf_obj$acf))
return(list(ccf_df))
}
out <- both %>%
group_by(id) %>%
filter(!(all(is.na(Affect))|all(is.na(Paranoia)))) %>%
mutate_at(vars(Affect, Paranoia), replace_na, 0) %>%
summarise(ccfout = ccf_as_df(Affect, Paranoia))
Now, the ccfout
column contains lists of dataframe
s, which you can unnest to get a dataframe with three columns: id
, lag
and ccf
.
This can then be grouped by id
to get the maximum ccf
and the lag
at which this occurs:
out %>%
unnest(ccfout) %>%
group_by(id) %>%
summarise(max_ccf = max(ccf),
max_ccf_lag = lag[which.max(ccf)])