Search code examples
rzooas.date

Negative months as difference between two dates


I am aware that there are some posts similar to this here on stackoverflow. But they did not directly address my issue. Here is my issue:

I have a variable called earliest_cr_line which contains dates as Jan-01. This is a string variable. I need to create a variable called "test" which should contain the difference between earliest_cr_line and Dec-2007 in months. To this end, I ran the following codes:

library(zoo)
loan_data$earliest_cr_line_date <- as.yearmon(loan_data$earliest_cr_line, "%b-%y")
ref_date <- as.yearmon("Dec-07", "%b-%y")
loan_data$test <- round((as.Date(ref_date) - 
as.Date(loan_data$earliest_cr_line_date))/(365.25/12))

However, the newly created variable test contains many negative numbers as well. I figured out that when converting earliest_cr_line from string to yearmon, R misinterpreted years which were before 1970. For example, yearmon converted Jan-60 into Nov 2060 instead of Nov 1960. That's what is causing the negative output. Any idea how I should approach this problem?

Thanks.


Solution

  • Date's integer is a day, making day-to-month determination inconsistent. yearmon's integer is a year, which makes a month just 1/12, a bit simpler to deal with. If you start with zoo's yearmon object, then I suggest you stick with it instead of trying convert to/from R's Date object.

    Handling wrong years is an annoying Y2K problem ... while this below will generally work (assuming that everything you're looking at is in the past), I urge you to fix this problem at the source. (I am astounded that something somewhere still thinks that 2-digit years is acceptable. *shrug*)

    vec <- c("Nov-60","Nov-70","Nov-71","Jan-01","Mar-05","Dec-07")
    (out <- zoo::as.yearmon(vec, format="%b-%y"))
    # [1] "Nov 2060" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
    (wrongcentury <- as.integer(gsub(".* ", "", out)) > as.integer(format(Sys.Date(), "%Y")))
    # [1]  TRUE FALSE FALSE FALSE FALSE FALSE
    vec[wrongcentury]
    # [1] "Nov-60"
    zoo::as.yearmon(gsub("-", "-19", vec[wrongcentury]), format = "%b-%Y")
    # [1] "Nov 1960"
    out[wrongcentury] <- zoo::as.yearmon(gsub("-", "-19", vec[wrongcentury]), format = "%b-%Y")
    out
    # [1] "Nov 1960" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
    

    Edit: much more concise recommendation from G. Grothendieck:

    out <- zoo::as.yearmon(vec, format="%b-%y")
    out - 100 * (out > zoo::as.yearmon(Sys.Date()))
    # [1] "Nov 1960" "Nov 1970" "Nov 1971" "Jan 2001" "Mar 2005" "Dec 2007"
    

    If your source data ever comes close to 1920, then this inferential solution will further break. (More reason to fix it at the source :-)