Search code examples
rdateif-statementreturn

Calculating Buy and Hold return around event date per ID in R


I wrote a similar question yesterday, but then the problem became more complicated as I proceeded, so I would like to ask again.

I have a question about calculating returns in the following case. For each ID, if Date=EventDate, I hope to calculate "buy and hold return" from 5 days prior to the event date to 5 days after.

To be more specific with the table below, I want to calculate returns around the event date (-5 ~ +5 days) for each ID, where the returns are, for example, (9/10-1), (12/10-1), (14/10-1), ~ , (14/10-1), (17/10-1), (16/10-1) for ID = 1 (That is, the price six days prior to the event date is the denominator in the return calculation.)

+----+------------+-------+------------+
| ID |    Date    | Price | EventDate  |
+----+------------+-------+------------+
|  1 | 2011-03-06 |    10 | NA         |
|  1 | 2011-03-07 |     9 | NA         |
|  1 | 2011-03-08 |    12 | NA         |
|  1 | 2011-03-09 |    14 | NA         |
|  1 | 2011-03-10 |    15 | NA         |
|  1 | 2011-03-11 |    17 | NA         |
|  1 | 2011-03-12 |    12 | 2011-03-12 |
|  1 | 2011-03-13 |    14 | NA         |
|  1 | 2011-03-14 |    17 | NA         |
|  1 | 2011-03-15 |    14 | NA         |
|  1 | 2011-03-16 |    17 | NA         |
|  1 | 2011-03-17 |    16 | NA         |
|  1 | 2011-03-18 |    15 | NA         |
|  1 | 2011-03-19 |    16 | NA         |
|  1 | 2011-03-20 |    17 | NA         |
|  1 | 2011-03-21 |    18 | NA         |
|  1 | 2011-03-22 |    11 | NA         |
|  1 | 2011-03-23 |    15 | NA         |
|  1 | 2011-03-24 |    12 | 2011-03-24 |
|  1 | 2011-03-25 |    13 | NA         |
|  1 | 2011-03-26 |    15 | NA         |
|  2 | 2011-06-11 |    48 | NA         |
|  2 | 2011-06-12 |    49 | NA         |
|  2 | 2011-06-13 |    50 | NA         |
|  2 | 2011-06-14 |    57 | NA         |
|  2 | 2011-06-15 |    60 | NA         |
|  2 | 2011-06-16 |    49 | NA         |
|  2 | 2011-06-17 |    64 | NA         |
|  2 | 2011-06-18 |    63 | NA         |
|  2 | 2011-06-19 |    67 | 2011-06-19 |
|  2 | 2011-06-20 |    70 | NA         |
|  2 | 2011-06-21 |    58 | NA         |
|  2 | 2011-06-22 |    65 | NA         |
|  2 | 2011-06-23 |    57 | NA         |
|  2 | 2011-06-24 |    55 | NA         |
|  2 | 2011-06-25 |    57 | NA         |
|  2 | 2011-06-26 |    60 | NA         |
+----+------------+-------+------------+

Eventually, I hope to make the following table with a new column.

+----+------------+-------+------------+---------------+
| ID |    Date    | Price | EventDate  | BuyHoldReturn |
+----+------------+-------+------------+---------------+
|  1 | 2011-03-06 |    10 | NA         | NA            |
|  1 | 2011-03-07 |     9 | NA         | -0.1          |
|  1 | 2011-03-08 |    12 | NA         | 0.2           |
|  1 | 2011-03-09 |    14 | NA         | 0.4           |
|  1 | 2011-03-10 |    15 | NA         | 0.5           |
|  1 | 2011-03-11 |    17 | NA         | 0.7           |
|  1 | 2011-03-12 |    12 | 2011-03-12 | 0.2           |
|  1 | 2011-03-13 |    14 | NA         | 0.4           |
|  1 | 2011-03-14 |    17 | NA         | 0.7           |
|  1 | 2011-03-15 |    14 | NA         | 0.4           |
|  1 | 2011-03-16 |    17 | NA         | 0.7           |
|  1 | 2011-03-17 |    16 | NA         | 0.6           |
|  1 | 2011-03-18 |    15 | NA         | NA            |
|  1 | 2011-03-19 |    16 | NA         | 0.066666667   |
|  1 | 2011-03-20 |    17 | NA         | 0.133333333   |
|  1 | 2011-03-21 |    18 | NA         | 0.2           |
|  1 | 2011-03-22 |    11 | NA         | -0.266666667  |
|  1 | 2011-03-23 |    15 | NA         | 0             |
|  1 | 2011-03-24 |    12 | 2011-03-24 | -0.2          |
|  1 | 2011-03-25 |    13 | NA         | -0.133333333  |
|  1 | 2011-03-26 |    15 | NA         | 0             |
|  2 | 2011-06-11 |    48 | NA         | NA            |
|  2 | 2011-06-12 |    49 | NA         | NA            |
|  2 | 2011-06-13 |    50 | NA         | NA            |
|  2 | 2011-06-14 |    57 | NA         | 0.14          |
|  2 | 2011-06-15 |    60 | NA         | 0.2           |
|  2 | 2011-06-16 |    49 | NA         | -0.02         |
|  2 | 2011-06-17 |    64 | NA         | 0.28          |
|  2 | 2011-06-18 |    63 | NA         | 0.26          |
|  2 | 2011-06-19 |    67 | 2011-06-19 | 0.34          |
|  2 | 2011-06-20 |    70 | NA         | 0.4           |
|  2 | 2011-06-21 |    58 | NA         | 0.16          |
|  2 | 2011-06-22 |    65 | NA         | 0.3           |
|  2 | 2011-06-23 |    57 | NA         | 0.14          |
|  2 | 2011-06-24 |    55 | NA         | 0.1           |
|  2 | 2011-06-25 |    57 | NA         | NA            |
|  2 | 2011-06-26 |    60 | NA         | NA            |
+----+------------+-------+------------+---------------+

I have an idea of using the code below but couldn't figure out how to calculate the 11 buy and hold returns around the event date.

data<-data%>%
group_by(ID)%>%
mutate(BuyHoldReturn=ifelse(Date==EventDate, ....

Thanks in advance!


Solution

  • You can do use dplyr as follows:

    # ensure Date and EventDate are Date columns
    df <- df %>% mutate(across(c(Date,EventDate), ~as.Date(.x)))
    
    
    left_join(
      select(df,ID, Date), 
      inner_join(df %>% select(-EventDate),filter(df,!is.na(EventDate)) %>% distinct(ID, EventDate), by="ID") %>%
        filter(abs(EventDate-Date)<=6) %>% 
        group_by(ID, EventDate) %>% 
        mutate(BuyHoldReturn = c(NA,Price[-1]/Price[1]-1)),
      by=c("ID", "Date")
    )
    

    Output:

       ID       Date Price  EventDate BuyHoldReturn
    1   1 2011-03-06    10 2011-03-12            NA
    2   1 2011-03-07     9 2011-03-12   -0.10000000
    3   1 2011-03-08    12 2011-03-12    0.20000000
    4   1 2011-03-09    14 2011-03-12    0.40000000
    5   1 2011-03-10    15 2011-03-12    0.50000000
    6   1 2011-03-11    17 2011-03-12    0.70000000
    7   1 2011-03-12    12 2011-03-12    0.20000000
    8   1 2011-03-13    14 2011-03-12    0.40000000
    9   1 2011-03-14    17 2011-03-12    0.70000000
    10  1 2011-03-15    14 2011-03-12    0.40000000
    11  1 2011-03-16    17 2011-03-12    0.70000000
    12  1 2011-03-17    16 2011-03-12    0.60000000
    13  1 2011-03-18    15 2011-03-12    0.50000000
    14  1 2011-03-18    15 2011-03-24            NA
    15  1 2011-03-19    16 2011-03-24    0.06666667
    16  1 2011-03-20    17 2011-03-24    0.13333333
    17  1 2011-03-21    18 2011-03-24    0.20000000
    18  1 2011-03-22    11 2011-03-24   -0.26666667
    19  1 2011-03-23    15 2011-03-24    0.00000000
    20  1 2011-03-24    12 2011-03-24   -0.20000000
    21  1 2011-03-25    13 2011-03-24   -0.13333333
    22  1 2011-03-26    15 2011-03-24    0.00000000
    23  2 2011-06-11    NA       <NA>            NA
    24  2 2011-06-12    NA       <NA>            NA
    25  2 2011-06-13    50 2011-06-19            NA
    26  2 2011-06-14    57 2011-06-19    0.14000000
    27  2 2011-06-15    60 2011-06-19    0.20000000
    28  2 2011-06-16    49 2011-06-19   -0.02000000
    29  2 2011-06-17    64 2011-06-19    0.28000000
    30  2 2011-06-18    63 2011-06-19    0.26000000
    31  2 2011-06-19    67 2011-06-19    0.34000000
    32  2 2011-06-20    70 2011-06-19    0.40000000
    33  2 2011-06-21    58 2011-06-19    0.16000000
    34  2 2011-06-22    65 2011-06-19    0.30000000
    35  2 2011-06-23    57 2011-06-19    0.14000000
    36  2 2011-06-24    55 2011-06-19    0.10000000
    37  2 2011-06-25    57 2011-06-19    0.14000000
    38  2 2011-06-26    NA       <NA>            NA
    

    This can be done similarly using data.table

    library(data.table)
    setDT(df)
    df[,(c("Date", "EventDate")):=lapply(.SD, as.Date), .SDcols=c("Date", "EventDate")]
    df[,!c("EventDate")][unique(df[!is.na(EventDate), .(ID, EventDate)]), on="ID", allow.cartesian=T][
      abs(EventDate-Date)<=6][,BuyHoldReturn:=c(NA,Price[-1]/Price[1]-1), .(ID,EventDate)][
        df[,.(ID,Date)], on=.(ID,Date)
    ]
    
    

    Both of the above approaches use cartesian joins, and then filter. If you have a very large table and you are looking for some increase in speed, you can also use a data.table non-equi join to isolate the rows before and after the EventDate(s) for each ID

    1. load library and set data to data.table
    library(data.table)
    setDT(dt)
    
    1. Create an events frame, which just has the ID and Events
    events = unique(df[!is.na(EventDate),.(ID,EventDate)])
    
          ID  EventDate
       <int>     <IDat>
    1:     1 2011-03-12
    2:     1 2011-03-24
    3:     2 2011-06-19
    
    1. Add some helper columns to the frames (this facilitates retention of columns in non-equi joins
    events[, eDate:=EventDate]
    df[, `:=`(s=Date-6, e=Date+6)]
    
    1. Use a non-equi join: require ID to match and EventDate to be between the lower and upper bounds created above (i.e. s and e)
    bhr = events[df, on=.(ID, EventDate>=s, EventDate<=e), nomatch=0]
    
    1. Generate the BuyHoldReturn column, by ID and EventDate
    bhr = bhr[, .(Date, BuyHoldReturn=c(NA, Price[-1]/Price[1] -1)), by = .(ID,eDate)]
    
    1. Merge back on the original frame to retain the rows with NA, and select columns with desired name and arrangement.
    bhr[df,on=.(ID,Date),.(ID,Date,Price,EventDate=i.EventDate,BuyHoldReturn)]
    

    Input:

    structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Date = c("2011-03-06", 
    "2011-03-07", "2011-03-08", "2011-03-09", "2011-03-10", "2011-03-11", 
    "2011-03-12", "2011-03-13", "2011-03-14", "2011-03-15", "2011-03-16", 
    "2011-03-17", "2011-03-18", "2011-03-19", "2011-03-20", "2011-03-21", 
    "2011-03-22", "2011-03-23", "2011-03-24", "2011-03-25", "2011-03-26", 
    "2011-06-11", "2011-06-12", "2011-06-13", "2011-06-14", "2011-06-15", 
    "2011-06-16", "2011-06-17", "2011-06-18", "2011-06-19", "2011-06-20", 
    "2011-06-21", "2011-06-22", "2011-06-23", "2011-06-24", "2011-06-25", 
    "2011-06-26"), Price = c(10L, 9L, 12L, 14L, 15L, 17L, 12L, 14L, 
    17L, 14L, 17L, 16L, 15L, 16L, 17L, 18L, 11L, 15L, 12L, 13L, 15L, 
    48L, 49L, 50L, 57L, 60L, 49L, 64L, 63L, 67L, 70L, 58L, 65L, 57L, 
    55L, 57L, 60L), EventDate = c(NA, NA, NA, NA, NA, NA, "2011-03-12", 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "2011-03-24", NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, "2011-06-19", NA, NA, NA, 
    NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -37L
    ))