in R regress all columns against a single vector and store regression coefficients & R-squared values

test_data <- cbind(Fund1 = c(NA, NA, NA,1,5,6,7,8,9,10),
                   Fund2 = c(NA, 1,2,4,5,6,7,5,NA,NA),
                   Fund3 = c(NA,2,4,5,6,7,5,4,NA,NA),
                   Fund4 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
peer_average <- rowMeans(as.data.frame(test_data), na.rm = TRUE)
test_data <- cbind(test_data, data.frame(peer_average))

I want to perform a regression of Fund 1, Fund 2 and Fund 3 against peer_average. In practice, I have a much larger dataframe, so want to make this simple to extend.

My goal is to have output_matrix1 = 4 Beta Coffecients, output_matrix2 = 4 alpha coefficients and output_matrix3 = 4 r-squared values, from each respective regression.

I know to run individual regressions as: lm(y ~ x), but I am not sure how to address NA's in the series. Originally I thought lapply would work but have not been able to figure it out. I want regressions to be calculated on all pairwise overlapping series. The problem with running lm(y~x) in this case, different to previous examples: Fitting a linear model with multiple LHS is Fund4

Solution

Here is how I would do it. Similar to the other answer, I would go from wide to long format (one function), then I would nest, map out the regressions, and pull out the coefficients:

library(tidyverse)

pivot_longer(test_data, 
             -peer_average, 
             names_to = "Fund", 
             names_pattern = "Fund(\\d+)", 
             values_drop_na = TRUE) |>
  nest(model = -Fund) |>
  mutate(model = map(model, ~summary(lm(peer_average~value, data = .x))),
         R2 = map_dbl(model, ~.x$r.squared),
         model = map(model, ~broom::tidy(.x)$estimate),
         model = map(model, ~set_names(.x, c("Intercept", "Coef"))))  |>
  unnest_wider(model)
#> # A tibble: 3 x 4
#>   Fund  Intercept  Coef    R2
#>   <chr>     <dbl> <dbl> <dbl>
#> 1 2         0.880 0.845 0.915
#> 2 3         0.273 0.897 0.580
#> 3 1         2.12  0.677 0.805