Search code examples
rdataframedata-scienceprojection

combine 2 df with a for loop to make projections


I have 2 data frames, both with strings, dates, and numbers. df1 is data for 2020 and df2 is data for 2021-2025. I will use df2 (column H) as a growth rate on df1. I need to multiply all numbers of df1 (columns D, E, F) by df2 (H[i,]) for each year in df2, from 2021 to 2025.

I have structured a function, but I am still thinking about how to address it. Could you please check my code and provide me with some ideas to complete the function? I really appreciate your help.

df1 <- read.csv("df1.csv", check.names=FALSE)
df2 <- read.csv("df2.csv", check.names=FALSE)

df1:
A   B   year    D   E   F
abc ab  2020    0   1   2
def cd  2020    3   4   0
ghi ef  2020    0   5   6
jkl gh  2020    7   8   0
mno ij  2020    0   9   10

df2: 
year    H
2021    1.1
2022    1.2
2023    1.3
2024    1.4
2025    1.5

df3 <- data.frame()
for (i in 1:length(df2)){
  df3 = rbind(df1, df2 %>% 
        mutate(df1$all_columns_with_numbers = all_columns_with_numbers[i,] * df2$H[i,] ))
}
df3

A    B     C    D    E      F
abc ab  2021    0    1.1    2.2
abc ab  2022    3.6  4.8    0
abc ab  2023    0    6.5    7.8
abc ab  2024    9.8  11.2   0
abc ab  2025    0    13.5   15
def cd  2021    …    …      …

Solution

  • It sounds like what you're wanting to do is a kind of cross join, followed by mutiplying the columns D, E, and F by H. That is shown below:

    library(tidyverse)
    
    df1 %>%
        select(-year) %>%
        cross_join(df2) %>%
        # multiply columns D, E, F by H
        mutate(across(c(D, E, F), ~ . * H)) %>%
        select(-H)
    
    # A tibble: 25 × 6
       A     B         D     E     F  year
       <chr> <chr> <dbl> <dbl> <dbl> <int>
     1 abc   ab      0     1.1   2.2  2021
     2 abc   ab      0     1.2   2.4  2022
     3 abc   ab      0     1.3   2.6  2023
     4 abc   ab      0     1.4   2.8  2024
     5 abc   ab      0     1.5   3    2025
     6 def   cd      3.3   4.4   0    2021
     7 def   cd      3.6   4.8   0    2022
     8 def   cd      3.9   5.2   0    2023
     9 def   cd      4.2   5.6   0    2024
    10 def   cd      4.5   6     0    2025
    # ℹ 15 more rows
    

    Though, after speaking with Onyambu, and reading through your question again, I'm now unsure! Maybe what you want is some way of getting D, E, and F multiplied by H for each year, something like this:

    grow_year <- function(df) {
        df %>%
        mutate(year = year + 1) %>%
        left_join(df2, by = "year") %>%
        mutate(across(c(D, E, F), ~ . * H)) %>%
        select(-H)
    }
    y21 <- grow_year(df1)
    y22 <- grow_year(y21)
    y23 <- grow_year(y22)
    y24 <- grow_year(y23)
    y25 <- grow_year(y24)
    
    out <- bind_rows(df1, y21, y22, y23, y24, y25)
    
    # A tibble: 30 × 6
       A     B      year     D     E     F
       <chr> <chr> <dbl> <dbl> <dbl> <dbl>
     1 abc   ab     2020   0     1     2  
     2 def   cd     2020   3     4     0  
     3 ghi   ef     2020   0     5     6  
     4 jkl   gh     2020   7     8     0  
     5 mno   ij     2020   0     9    10  
     6 abc   ab     2021   0     1.1   2.2
     7 def   cd     2021   3.3   4.4   0  
     8 ghi   ef     2021   0     5.5   6.6
     9 jkl   gh     2021   7.7   8.8   0  
    10 mno   ij     2021   0     9.9  11  
    # ℹ 20 more rows