Search code examples
rfinancedummy-variable

Turn of month dummy variable


I have a dataset on a stock exchange's daily closing price and their respective dates for several years. I have further created a counter, counting which trading day in the month each day is (because the dataset is excluding weekends and holidays). It looks like this:

df$date <- as.Date(c("2017-03-25","2017-03-26","2017-03-27","2017-03-29","2017-03-30",
                     "2017-03-31","2017-04-03","2017-04-04","2017-04-05","2017-04-06",
                     "2017-04-07","2017-04-08","2017-04-09"))

df$DayofMonth <- c(18,19,20,21,22,23,1,2,3,4,5,6,7)

df$price <- (100, 100.53, 101.3 ,100.94, 101.42, 101.40, 101.85, 102, 101.9, 102, 102.31, 102.1, 102.23)

I would now like to create a dummyvariable taking the value 1 for the last 3 trading days and the first 5 trading days of the following month, for every month. So it would in this case look something like this:

    df$ToM_dummy <- c(0,0,0,1,1,1,1,1,1,1,1,0,0)

Thanks for helping out!


Solution

  • Here's a dplyr solution. It's probably a little more complex than it needs to be for your real data because your sample stops on the 7th day of a month, and the algorithm needs to know that 7 isn't really the end of the month - the data is just incomplete for that month.

    I have therefore arbitrarily added a cutoff of 18 days to indicate that if there are less trading days than that in a month we can assume the data for that month is incomplete. You may wish to change this if needed (I have no idea whether there are always more than 18 trading days in December or February, for example)

    library(dplyr)
    
    df %>% 
      mutate(month = lubridate::month(date)) %>% 
      group_by(month) %>%
      mutate(ToM_dummy = +(DayofMonth < 6 | 
                          (DayofMonth > (max(DayofMonth) - 3) &
                          max(DayofMonth) > 18))) # Change to appropriate number
    #> # A tibble: 13 x 5
    #> # Groups:   month [2]
    #>    date       DayofMonth price month ToM_dummy
    #>    <date>          <dbl> <dbl> <dbl>     <int>
    #>  1 2017-03-25         18  100      3         0
    #>  2 2017-03-26         19  101.     3         0
    #>  3 2017-03-27         20  101.     3         0
    #>  4 2017-03-29         21  101.     3         1
    #>  5 2017-03-30         22  101.     3         1
    #>  6 2017-03-31         23  101.     3         1
    #>  7 2017-04-03          1  102.     4         1
    #>  8 2017-04-04          2  102      4         1
    #>  9 2017-04-05          3  102.     4         1
    #> 10 2017-04-06          4  102      4         1
    #> 11 2017-04-07          5  102.     4         1
    #> 12 2017-04-08          6  102.     4         0
    #> 13 2017-04-09          7  102.     4         0
    

    Data

    df <-  structure(list(date = structure(c(17250, 17251, 17252, 17254, 
    17255, 17256, 17259, 17260, 17261, 17262, 17263, 17264, 17265
    ), class = "Date"), DayofMonth = c(18, 19, 20, 21, 22, 23, 1, 
    2, 3, 4, 5, 6, 7), price = c(100, 100.53, 101.3, 100.94, 101.42, 
    101.4, 101.85, 102, 101.9, 102, 102.31, 102.1, 102.23)), row.names = c(NA, 
    -13L), class = "data.frame")
    
    df
    #>          date DayofMonth  price
    #> 1  2017-03-25         18 100.00
    #> 2  2017-03-26         19 100.53
    #> 3  2017-03-27         20 101.30
    #> 4  2017-03-29         21 100.94
    #> 5  2017-03-30         22 101.42
    #> 6  2017-03-31         23 101.40
    #> 7  2017-04-03          1 101.85
    #> 8  2017-04-04          2 102.00
    #> 9  2017-04-05          3 101.90
    #> 10 2017-04-06          4 102.00
    #> 11 2017-04-07          5 102.31
    #> 12 2017-04-08          6 102.10
    #> 13 2017-04-09          7 102.23