test_data <- cbind(Fund1 = c(NA, NA, NA,1,5,6,7,8,9,10),
Fund2 = c(NA, 1,2,4,5,6,7,5,NA,NA),
Fund3 = c(NA,2,4,5,6,7,5,4,NA,NA),
Fund4 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
peer_average <- rowMeans(as.data.frame(test_data), na.rm = TRUE)
test_data <- cbind(test_data, data.frame(peer_average))
I want to perform a regression of Fund 1, Fund 2 and Fund 3 against peer_average. In practice, I have a much larger dataframe, so want to make this simple to extend.
My goal is to have output_matrix1 = 4 Beta Coffecients, output_matrix2 = 4 alpha coefficients and output_matrix3 = 4 r-squared values, from each respective regression.
I know to run individual regressions as: lm(y ~ x)
, but I am not sure how to address NA's in the series. Originally I thought lapply
would work but have not been able to figure it out. I want regressions to be calculated on all pairwise overlapping series. The problem with running lm(y~x) in this case, different to previous examples: Fitting a linear model with multiple LHS is Fund4
Here is how I would do it. Similar to the other answer, I would go from wide to long format (one function), then I would nest, map out the regressions, and pull out the coefficients:
library(tidyverse)
pivot_longer(test_data,
-peer_average,
names_to = "Fund",
names_pattern = "Fund(\\d+)",
values_drop_na = TRUE) |>
nest(model = -Fund) |>
mutate(model = map(model, ~summary(lm(peer_average~value, data = .x))),
R2 = map_dbl(model, ~.x$r.squared),
model = map(model, ~broom::tidy(.x)$estimate),
model = map(model, ~set_names(.x, c("Intercept", "Coef")))) |>
unnest_wider(model)
#> # A tibble: 3 x 4
#> Fund Intercept Coef R2
#> <chr> <dbl> <dbl> <dbl>
#> 1 2 0.880 0.845 0.915
#> 2 3 0.273 0.897 0.580
#> 3 1 2.12 0.677 0.805