Search code examples
rdata.tablegroupinglag

Create lag variable with conditions and group by id


I am struggling to create a new variable that captures the stock price (prc) at the begin of a fiscal year (fyear) for a given firm.

In the data the fiscal year is defined with a start and end date, complemented with the monthly stock price. The stock price is based on the price of the last trading day of the month, and therefore is not always on the last day of the month.

For example: as the fiscal year starts on the first of January of 2001 I would like to get the stock price at the end of December 2000.

Here's a sample of the data:

dt <- data.table(id = rep(c(59328, 61241), each = 36), fyear = c(rep(2001, 
    each = 12), rep(2002, each = 12), rep(2003, each = 12), rep(2001, 
    each = 12), rep(2002, each = 12), rep(2003, each = 12)), 
    fyear_start = as.Date(c(rep("2001-01-01", each = 12), rep("2002-01-01", 
        each = 12), rep("2003-01-01", each = 12), rep("2000-07-01", 
        each = 12), rep("2001-07-01", each = 12), rep("2002-07-01", 
        each = 12))), fyear_end = as.Date(c(rep("2001-12-31", 
        each = 12), rep("2002-12-31", each = 12), rep("2003-12-31", 
        each = 12), rep("2001-06-30", each = 12), rep("2002-06-30", 
        each = 12), rep("2003-06-30", each = 12))), prc_month_end = as.Date(c("2001-01-31", 
        "2001-02-28", "2001-03-30", "2001-04-30", "2001-05-31", 
        "2001-06-29", "2001-07-31", "2001-08-31", "2001-09-28", 
        "2001-10-31", "2001-11-30", "2001-12-31", "2002-01-31", 
        "2002-02-28", "2002-03-28", "2002-04-30", "2002-05-31", 
        "2002-06-28", "2002-07-31", "2002-08-30", "2002-09-30", 
        "2002-10-31", "2002-11-29", "2002-12-31", "2003-01-31", 
        "2003-02-28", "2003-03-31", "2003-04-30", "2003-05-30", 
        "2003-06-30", "2003-07-31", "2003-08-29", "2003-09-30", 
        "2003-10-31", "2003-11-28", "2003-12-31", "2000-07-31", 
        "2000-08-31", "2000-09-29", "2000-10-31", "2000-11-30", 
        "2000-12-29", "2001-01-31", "2001-02-28", "2001-03-30", 
        "2001-04-30", "2001-05-31", "2001-06-29", "2001-07-31", 
        "2001-08-31", "2001-09-28", "2001-10-31", "2001-11-30", 
        "2001-12-31", "2002-01-31", "2002-02-28", "2002-03-28", 
        "2002-04-30", "2002-05-31", "2002-06-28", "2002-07-31", 
        "2002-08-30", "2002-09-30", "2002-10-31", "2002-11-29", 
        "2002-12-31", "2003-01-31", "2003-02-28", "2003-03-31", 
        "2003-04-30", "2003-05-30", "2003-06-30")), prc = c(37, 
        28.56, 26.31, 30.91, 27.01, 29.25, 29.81, 27.96, 20.44, 
        24.42, 32.66, 31.45, 35.04, 28.55, 30.41, 28.61, 27.62, 
        18.27, 18.79, 16.67, 13.89, 17.3, 20.88, 15.57, 15.7, 
        17.26, 16.28, 18.37, 20.82, 20.81, 24.89, 28.59, 27.52, 
        32.95, 33.54, 32.05, 24.6, 21.5, 26.54, 31, 28.25, 28.9, 
        18.26, 13.55, 8.15, 9.84, 13.56, 15.86, 16.05, 13.5, 
        14.71, 11.18, 11.43, 9.72, 8.03, 8.85, 5.34, 6.14, 9, 
        6.46, 5.24, 5.49, 6.18, 7.44, 7.28, 6.41, 7.3, 11.29, 
        11.11, 15.2, 17.97, 14.9))

First three rows:

       id fyear fyear_start  fyear_end prc_month_end   prc
 1: 59328  2001  2001-01-01 2001-12-31    2001-01-31 37.00
 2: 59328  2001  2001-01-01 2001-12-31    2001-02-28 28.56
 3: 59328  2001  2001-01-01 2001-12-31    2001-03-30 26.31

I've read the following posts for guidance, but I didn't get the expected result.

  1. How to loop lapply to create LAG terms over multiple variables in R

    • Using the solution of thelatemail, I can make a lagged variable of the stock price. However, it takes the previous month stock price, and dit not take into account the fiscal year.
    vars <- c("prc")
    rpv <- rep(1:2, each=length(vars))
    dt_test <- dt[, paste(vars, "lag", rpv, sep="_") := Map(shift, .SD, rpv), by=id, .SDcols=vars]
    
  2. How to create lag variables

    • Same as above, lag variable stock price is based on the previous month.
  3. Create lead and lag variables in R
    • Same as above, lag variable stock price is based on the previous month.

Can't use the .SD[1]/.N statement of data.table, as it returns the first/last month of the fiscal year, and not the last of the previous fiscal year.

Is there a way to return for a fiscal year the last monthly stock price at previous fiscal year?

The desired result is as follows:

output <- data.table(id = rep(c(59328, 61241), each = 3), fyear = c(2001, 
    2002, 2003, 2001, 2002, 2003), fyear_start = as.Date(c("2001-01-01", 
    "2002-01-01", "2003-01-01", "2000-07-01", "2001-07-01", "2002-07-01")), 
    fyear_end = as.Date(c("2001-12-31", "2002-12-31", "2003-12-31", 
        "2001-06-30", "2002-06-30", "2003-06-30")), begin_prc = c(NA, 
        31.45, 15.57, NA, 15.86, 6.46))

      id fyear fyear_start  fyear_end begin_prc
1: 59328  2001  2001-01-01 2001-12-31        NA
2: 59328  2002  2002-01-01 2002-12-31     31.45
3: 59328  2003  2003-01-01 2003-12-31     15.57
4: 61241  2001  2000-07-01 2001-06-30        NA
5: 61241  2002  2001-07-01 2002-06-30     15.86
6: 61241  2003  2002-07-01 2003-06-30      6.46

I'd appreciate some assistance. Thanks in advance.


Solution

  • Is there a way to return for a fiscal year the last monthly stock price at previous fiscal year?

    out = unique(dt[, .(id, fyear, fyear_start, fyear_end)])
    
    out[, prc_end := {
      dt[.(id = .SD$id, prc_month_end = .SD$fyear_start - 1L), on=.(id, prc_month_end), roll=TRUE, x.prc]
    }]
    
          id fyear fyear_start  fyear_end prc_end
    1: 59328  2001  2001-01-01 2001-12-31      NA
    2: 59328  2002  2002-01-01 2002-12-31   31.45
    3: 59328  2003  2003-01-01 2003-12-31   15.57
    4: 61241  2001  2000-07-01 2001-06-30      NA
    5: 61241  2002  2001-07-01 2002-06-30   15.86
    6: 61241  2003  2002-07-01 2003-06-30    6.46
    

    This is a rolling update join: For rows of table out

    • Construct the lookup vectors .(id, fyear_start - 1) using .SD = out, the subset of data
    • Lookup rows of dt, "rolling" the last vector, fyear_start - 1, to the nearest earlier date
    • Take matched values of x.prc, the prc column from dt

    The notation x.* comes from the x[i] join/lookup syntax. For more details, see ?data.table.