Search code examples
rdata-analysiscross-joinrdata

combine 2 df to make projections using previous result of the cross_join and growth rate


In my previous question here, I did not consider to use the previous calculated value in order to produce the required projection with incremental growth rate (H). Two contributors corrected my previous intent. I used:

mutate(cross_join(df1[-3], df2), across(D:F)*H, H = NULL)

However, I would like now to estimate the new value times H using the previous one from across(D*F). Something like this:

df1:
A   B   year    D   E   F
abc ab  2020    0   1   2
def cd  2020    3   4   0
ghi ef  2020    0   5   6
jkl gh  2020    7   8   0
mno ij  2020    0   9   10

df2: 
year    H
2021    1.1
2022    1.2
2023    1.3
2024    1.4
2025    1.5

df3 # for abc ab
A   B   year    D    E     F      H
abc ab  2020    0   1.00    2.00    1
abc ab  2021    0   1.10    2.20    1.1
abc ab  2022    0   1.32    2.64    1.2
abc ab  2023    0   1.72    3.43    1.3
abc ab  2024    0   2.40    4.80    1.4
abc ab  2025    0   3.60    7.21    1.5

As can be seen, the values now increment according the H growth rate. I think the formula is something like this

Xn(1+Hm) 
Where
m = n+1
Xn = across(D:F)
Hm = H

df4 # for def   cd
A    B  year    D      E      F   H
def cd  2020    3        4    0   1
def cd  2021    3.3    4.4  0   1.1
def cd  2022    4.0    5.3  0   1.2
def cd  2023    5.1    6.9  0   1.3
def cd  2024    7.2    9.6  0   1.4
def cd  2025    10.8    14.4    0   1.5

I have tried to do something like this

mutate(cross_join(df1[-3], df2), across(D:F)[-1]*H, H = NULL)
I also tried the 
lag and lead functions combined with across

but did not work either.

Any help is appreciate it.

Edited: new try

Try 1:

In df2, I have added an example of what I expected. 0.1 is given (e.g. from df1), so it is the first observation of example column. Then second is 0.1 * (H+1), and so on; values are estimated with the previous one in a commutative manner.

Time      H   example
2010  0.000 0.1000000
2011  0.063 0.1062585
2012  0.049 0.1114821
2013  0.061 0.1182550
2014  0.057 0.1250279
2015  0.028 0.1285086
2016  0.060 0.1361788
2017  0.058 0.1441024
2018 -0.049 0.1370985
2019  0.058 0.1449823
2020 -0.149 0.1233892
2021  0.159 0.1430246
2022  0.076 0.1538541
2023  0.168 0.1796427
2024  0.144 0.2054314
2025  0.126 0.2312200
2026  0.112 0.2570086
2027  0.100 0.2827973
2028  0.091 0.3085859
2029  0.084 0.3343745
2030  0.077 0.3601632

I tried this 
df2_1 <- df2 %>% mutate(example2 = cumsum(0.1 * (H+1)))
head(df2_1)

Time     H   example example2
2010 0.000 0.1000000   0.1000
2011 0.063 0.1062585   0.2063
2012 0.049 0.1114821   0.3112
2013 0.061 0.1182550   0.4173
2014 0.057 0.1250279   0.5230
2015 0.028 0.1285086   0.6258

Try 2:

df2_1 <- df2 %>% mutate(example2 = accumulate(H, ~ 0.1 * ( .x +1)))
head(df2_1)

Time     H   example example2
2010 0.000 0.1000000  0.00000
2011 0.063 0.1062585  0.10000
2012 0.049 0.1114821  0.11000
2013 0.061 0.1182550  0.11100
2014 0.057 0.1250279  0.11110
2015 0.028 0.1285086  0.11111

Still do not work.

Once I do it for this example, I will try to inserted in the cross_join and across mutate function.


Solution

  • Building off your other question (which is quite similar in some ways):

    df2 %>%
        add_row(year=2020, H=1, .before=1) %>% # create a 2020 row, because the desired output has this
        cross_join(df1) %>% 
        group_by(A, B) %>% 
        mutate(across(c(D:F), ~ . * H)) %>% # multiply D, E, and F by H ("." is a stand in for the columns)
        select(A, B, year = year.x, D:F, H) %>% # select the columns we want
        ungroup() %>%
        arrange(A, B, year)
    

    I thought I might've had to use rowwise(), but it seems to work without it!