I have a list of dataframes consisting of a date column, a monthly stock return column and a range of columns with different (monthly) risk parametres (180 to be exact). I have grouped my data by month and year. The purpose is to iterate through each dataframe in the list and calculate the covariance between the return column and each of the risk parametres, returning the yearly covariance, i.e. 180 covariances per year.
The data looks like this:(https://i.sstatic.net/19jeH.png)
I have figured out how to do this for a single risk parameter in a single dataframe using the following code:
df <- df %>%
mutate(month = format(date, "%m"), year = format(date, "%Y")) %>% group_by(year) %>%
summarise(cov = cov(ret, Natural.disasters))
This gives the following output: (https://i.sstatic.net/VjzIo.png)
I just cant figure out how to do this between return and the rest of the risk parametres. Any suggestions?
Tried different loops but nothing seems to work. Also tried lapply() functions but I can't make it work.
Figured it out myself. Heres the code:
#Split df into list of df's by permno
df.list <- split(Merged_df , f = Merged_df$permno )
# Function that loops through list of dfs and calculates yearly
# covariance between return and narratives
cov_df_wide_list[1:2] <- lapply(df.list, function(df) {
# create a new column with year information
df <- df %>%
mutate(year = lubridate::year(date))
# group the dataframe by year and each parameter column
# and calculate the covariance with the return column
cov_df <- df %>%
select(year, permno, Natural.disasters:Revenue.growth, ret) %>%
gather(key = "param_col", value = "value", -year, -permno, -ret) %>%
group_by(year, permno, param_col) %>%
summarize(cov = cov(value, ret))
# spread the dataframe to get each year's covariances
cov_df_wide <- cov_df %>%
spread(key = "param_col", value = "cov")
return(cov_df_wide)
})
Basically the function loops through a list of dataframes and computes the covariance between my return column and each of the risk parametres.