I want to regress each of a subset of columns of a data frame on another column of the same data frame (and on top on a subset of observations), and then append the fitted values as new and named columns to the original data frame. As an example, I'll use the EuStockMarkets data that comes with R, transformed into a data frame df
.
df <- zoo::fortify.zoo(EuStockMarkets)
I would like to regress the columns DAX
, SMI
, CAC
, and FTSE
on Index
> colnames(df)
[1] "Index" "DAX" "SMI" "CAC" "FTSE"
for values of Index
up to Index == 1996.877
using {purrr}
functions to avoid a loop. Then, add the fitted values as new columns to df
with names DAX_fitted
, SMI_fitted
, CAC_fitted
, and FTSE_fitted
.
I came up with two options until now:
fitted <- df %>%
select(-Index) %>%
names() %>%
paste(.,' ~ Index') %>%
map(as.formula) %>%
map2(., .y = rep(list(df %>% filter(Index < 1996.877)), length(.)), ~ predict(lm(.x, data = .y)))
which gives me a list of the fitted values or
lm <- df %>%
select(-Index) %>%
names() %>%
paste(.,' ~ Index') %>%
map(as.formula) %>%
map(lm, data = df %>% filter(Index < 1996.877))
which returns a list lm
with regression results.
Ideas on how to complete these code lines to add the named fitted values to the original data frame? Thanks!
With dplyr
, you can use mutate()
+ across()
to achieve that:
library(dplyr)
df %>%
filter(Index < 1996.877) %>%
mutate(across(-Index, ~ lm(.x ~ Index)$fitted.values,
.names = "{.col}_fitted"))
# # A tibble: 1,400 × 9
# Index DAX SMI CAC FTSE DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1991. 1629. 1678. 1773. 2444. 1465. 1535. 1865. 2363.
# 2 1992. 1614. 1688. 1750. 2460. 1465. 1536. 1865. 2364.
# 3 1992. 1607. 1679. 1718 2448. 1466. 1538. 1865. 2365.
# 4 1992. 1621. 1684. 1708. 2470. 1467. 1539. 1865. 2366.
# 5 1992. 1618. 1687. 1723. 2485. 1468. 1541. 1865. 2367.
# 6 1992. 1611. 1672. 1714. 2467. 1468. 1542. 1865. 2368.
# 7 1992. 1631. 1683. 1734. 2488. 1469. 1544. 1865. 2369.
# 8 1992. 1640. 1704. 1757. 2508. 1470. 1545. 1866. 2370.
# 9 1992. 1635. 1698. 1754 2510. 1471. 1547. 1866. 2371.
# 10 1992. 1646. 1716. 1754. 2497. 1471. 1548. 1866. 2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows
To minimally modify the code you have tried, you just need set_names()
and map_dfc()
:
library(purrr)
df %>%
select(-Index) %>%
names() %>%
set_names(paste0, "_fitted") %>%
map_dfc(~ lm(as.formula(paste(.x, "~ Index")), filter(df, Index < 1996.877))$fitted.values)
# # A tibble: 1,400 × 4
# DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
# <dbl> <dbl> <dbl> <dbl>
# 1 1465. 1535. 1865. 2363.
# 2 1465. 1536. 1865. 2364.
# 3 1466. 1538. 1865. 2365.
# 4 1467. 1539. 1865. 2366.
# 5 1468. 1541. 1865. 2367.
# 6 1468. 1542. 1865. 2368.
# 7 1469. 1544. 1865. 2369.
# 8 1470. 1545. 1866. 2370.
# 9 1471. 1547. 1866. 2371.
# 10 1471. 1548. 1866. 2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows