I have 2 data frames, both with strings, dates, and numbers. df1 is data for 2020 and df2 is data for 2021-2025. I will use df2 (column H) as a growth rate on df1. I need to multiply all numbers of df1 (columns D, E, F) by df2 (H[i,]) for each year in df2, from 2021 to 2025.
I have structured a function, but I am still thinking about how to address it. Could you please check my code and provide me with some ideas to complete the function? I really appreciate your help.
df1 <- read.csv("df1.csv", check.names=FALSE)
df2 <- read.csv("df2.csv", check.names=FALSE)
df1:
A B year D E F
abc ab 2020 0 1 2
def cd 2020 3 4 0
ghi ef 2020 0 5 6
jkl gh 2020 7 8 0
mno ij 2020 0 9 10
df2:
year H
2021 1.1
2022 1.2
2023 1.3
2024 1.4
2025 1.5
df3 <- data.frame()
for (i in 1:length(df2)){
df3 = rbind(df1, df2 %>%
mutate(df1$all_columns_with_numbers = all_columns_with_numbers[i,] * df2$H[i,] ))
}
df3
A B C D E F
abc ab 2021 0 1.1 2.2
abc ab 2022 3.6 4.8 0
abc ab 2023 0 6.5 7.8
abc ab 2024 9.8 11.2 0
abc ab 2025 0 13.5 15
def cd 2021 … … …
It sounds like what you're wanting to do is a kind of cross join, followed by mutiplying the columns D, E, and F by H. That is shown below:
library(tidyverse)
df1 %>%
select(-year) %>%
cross_join(df2) %>%
# multiply columns D, E, F by H
mutate(across(c(D, E, F), ~ . * H)) %>%
select(-H)
# A tibble: 25 × 6
A B D E F year
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 abc ab 0 1.1 2.2 2021
2 abc ab 0 1.2 2.4 2022
3 abc ab 0 1.3 2.6 2023
4 abc ab 0 1.4 2.8 2024
5 abc ab 0 1.5 3 2025
6 def cd 3.3 4.4 0 2021
7 def cd 3.6 4.8 0 2022
8 def cd 3.9 5.2 0 2023
9 def cd 4.2 5.6 0 2024
10 def cd 4.5 6 0 2025
# ℹ 15 more rows
Though, after speaking with Onyambu, and reading through your question again, I'm now unsure! Maybe what you want is some way of getting D, E, and F multiplied by H for each year, something like this:
grow_year <- function(df) {
df %>%
mutate(year = year + 1) %>%
left_join(df2, by = "year") %>%
mutate(across(c(D, E, F), ~ . * H)) %>%
select(-H)
}
y21 <- grow_year(df1)
y22 <- grow_year(y21)
y23 <- grow_year(y22)
y24 <- grow_year(y23)
y25 <- grow_year(y24)
out <- bind_rows(df1, y21, y22, y23, y24, y25)
# A tibble: 30 × 6
A B year D E F
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 abc ab 2020 0 1 2
2 def cd 2020 3 4 0
3 ghi ef 2020 0 5 6
4 jkl gh 2020 7 8 0
5 mno ij 2020 0 9 10
6 abc ab 2021 0 1.1 2.2
7 def cd 2021 3.3 4.4 0
8 ghi ef 2021 0 5.5 6.6
9 jkl gh 2021 7.7 8.8 0
10 mno ij 2021 0 9.9 11
# ℹ 20 more rows