Search code examples
rdateconditional-statements

Move the values of variable up a month in dataset - R


I have a variable x in my dataset (Jan 2018 - June 2021):

    date       | specific_date | x
    2018-01-31 | 2019-10-31    | 0
                   ...
    2019-08-31 | 2019-10-31    | 0
    2019-09-30 | 2019-10-31    | 0
    2019-10-31 | 2019-10-31    | 1
    2019-11-30 | 2019-10-31    | NA
    2019-12-31 | 2019-10-31    | NA
                   ...
    2021-06-30 | 2019-10-31    | NA

I would like to move everything up a month in x, like this (1 goes from date=2019-10-31 to date=2019-09-30, and so on):

date       | specific_date | variable
2018-01-31 | 2019-10-31    | 0
                   ...
2019-08-31 | 2019-10-31    | 0
2019-09-30 | 2019-10-31    | 1
2019-10-31 | 2019-10-31    | NA
2019-11-30 | 2019-10-31    | NA
2019-12-31 | 2019-10-31    | NA
                   ...
2021-06-30 | 2019-10-31    | NA

Any ideas on how to do this?


Solution

  • Assuming the input shown in reproducible form in the Note at the end and assuming you want to move the date column x months earlier convert it to yearmon class, subtract x/12 (since yearmon represents each date as a year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec) or subtract 0 if x is NA and then use as.Date to convert to Date class. frac=1 means convert to a date at the end of the month.

    Another possibility is to just use yearmon class and not Date class since that directly represents a year and month without a day and the end-of-month dates shown seem just to be a way of representing a year and month. In that case omit the as.Date and its frac= argument.

    library(dplyr, exclude = c("lag", "filter"))
    library(zoo)    
    
    DF %>%
      mutate(date = as.Date(as.yearmon(date) - coalesce(x, 0)/12, frac = 1))
    

    giving:

            date specific_date  x
    1 2018-01-31    2019-10-31  0
    2 2019-08-31    2019-10-31  0
    3 2019-09-30    2019-10-31  0
    4 2019-09-30    2019-10-31  1
    5 2019-11-30    2019-10-31 NA
    6 2019-12-31    2019-10-31 NA
    7 2021-06-30    2019-10-31 NA
    

    Note

    The input data frame in reproducible form:

    Lines <- "    date       | specific_date | x
        2018-01-31 | 2019-10-31    | 0
                       ...
        2019-08-31 | 2019-10-31    | 0
        2019-09-30 | 2019-10-31    | 0
        2019-10-31 | 2019-10-31    | 1
        2019-11-30 | 2019-10-31    | NA
        2019-12-31 | 2019-10-31    | NA
                       ...
        2021-06-30 | 2019-10-31    | NA"
    DF <- read.table(text = Lines, header = TRUE, sep = "|", strip.white = TRUE,
     comment.char = ".")