Search code examples
rtransformlag

how to compute lagged investment in r


I have created df1

year gvkey  capex   ppent
2004  1004 13.033 139.137
2005  1004 16.296 213.380
2006  1004 29.891 260.167
2007  1004 30.334 310.393
2008  1004 27.535 245.586
2009  1004 28.855 334.430
...

I have created df2

year gvkey    ROA
2005  1004 0.02796478
2006  1004 0.04665171
2007  1004 0.05976127
2008  1004 0.06255035
2009  1004 0.03549220
2005  1013 0.06882688
...

I wan to create df3

year gvkey    ROA               lag_investment
2005  1004 0.02796478  capex from 2004 / ppent from 2004
2006  1004 0.04665171  capex from 2005 / ppent from 2005
2007  1004 0.05976127  capex from 2006 / ppent from 2006
2008  1004 0.06255035  capex from 2007 / ppent from 2007
2009  1004 0.03549220  capex from 2008 / ppent from 2008
2005  1013 0.06882688  capex from 2004 / ppent from 2004
...

I have over 2,000 firms years. gvkey = firm id

What I basically want to do is the following:

1) compute the investment for the previous year from df1

2) create a column called "lag_investment" in df2

2) insert the value from step 1) in the current year row in df2

Additional question:

How would the code look like if I want to do the following?

I have created df1

  year gvkey        ROA   ppent  capex
1 2004  1004 0.01320911 139.137 13.033
2 2005  1004 0.03005708 213.380 16.296
3 2006  1004 0.05014214 260.167 29.891
4 2007  1004 0.06423255 310.393 30.334
5 2008  1004 0.06723031 245.586 27.535
6 2009  1004 0.03814769 334.430 28.855
...

I want to add a variable to df1

  year gvkey        ROA   ppent  capex         lag_investment
1 2004  1004 0.01320911 139.137 13.033
2 2005  1004 0.03005708 213.380 16.296  capex from 2004 / ppent from 2004
3 2006  1004 0.05014214 260.167 29.891  capex from 2005 / ppent from 2005
4 2007  1004 0.06423255 310.393 30.334  capex from 2006 / ppent from 2006
5 2008  1004 0.06723031 245.586 27.535  capex from 2007 / ppent from 2007
6 2009  1004 0.03814769 334.430 28.855  capex from 2008 / ppent from 2008
...

I want to calculate the lag_investment for all years except 2004.

Thank you so much!!!


Solution

  • With data.table, we can do

    library(data.table)
    setDT(df1)[, lag_investment :=Reduce(`/`, shift(.SD)), .SDcols = c("capex", "ppent")]
    df1
    #   year gvkey        ROA   ppent  capex lag_investment
    #1: 2004  1004 0.01320911 139.137 13.033             NA
    #2: 2005  1004 0.03005708 213.380 16.296     0.09367027
    #3: 2006  1004 0.05014214 260.167 29.891     0.07637079
    #4: 2007  1004 0.06423255 310.393 30.334     0.11489159
    #5: 2008  1004 0.06723031 245.586 27.535     0.09772772
    #6: 2009  1004 0.03814769 334.430 28.855     0.11211958
    

    Or in base R

    df1$lag_investment <- with(df1, c(NA, head(capex, -1)/head(ppent, -1)))
    

    Or it can be written as

    df1$lag_investment <- with(df1, c(NA, capex[-nrow(df1)]/ppent[-nrow(df1)]))
    

    data

    df1 <-  structure(list(year = 2004:2009, gvkey = c(1004L, 1004L, 1004L, 
    1004L, 1004L, 1004L), ROA = c(0.01320911, 0.03005708, 0.05014214, 
    0.06423255, 0.06723031, 0.03814769), ppent = c(139.137, 213.38, 
    260.167, 310.393, 245.586, 334.43), capex = c(13.033, 16.296, 
    29.891, 30.334, 27.535, 28.855)), class = "data.frame", 
    row.names = c("1", 
    "2", "3", "4", "5", "6"))