Search code examples
rdplyrdata-manipulationdatediffdate-difference

r group by date difference with respect to first date


I have a dataset that looks like this.

   Id      Date1        Cars
   1       2007-04-05   72
   2       2014-01-07   12
   2       2018-07-09   10
   2       2018-07-09   13 
   3       2005-11-19   22
   3       2005-11-23   13 
   4       2010-06-17   38
   4       2010-09-23   57 
   4       2010-09-23   41
   4       2010-10-04   17

What I would like to do is for each Id get the date difference with respect to the 1st Date (Earliest) date for that Id. For each Id, (EarliestDate - 2nd Earliest Date), (EarliestDate - 3rd Earliest Date), (Earliest Date - 4th Earliest Date) ... so on.

I would end up with a dataset like this

   Id      Date1        Cars   Diff
   1       2007-04-05   72     NA
   2       2014-01-07   12     NA 
   2       2018-07-09   10     1644 = (2018-07-09 - 2014-01-07)
   2       2018-07-09   13     1644 = (2018-07-09 - 2014-01-07) 
   3       2005-11-19   22     NA
   3       2005-11-23   13     4    = (2005-11-23 - 2005-11-19)
   4       2010-06-17   38     NA
   4       2010-09-23   57     98   = (2010-09-23 - 2010-06-17)
   4       2010-09-23   41     98   = (2010-09-23 - 2010-06-17)
   4       2010-10-04   17     109  = (2010-10-04 - 2010-09-23)

I am unclear on how to accomplish this. Any help would be much appreciated. Thanks


Solution

  • Change Date1 to date class.

    df$Date1 = as.Date(df$Date1)
    

    You can subtract with the first value in each Id. This can be done using dplyr.

    library(dplyr)
    df %>% group_by(Id) %>% mutate(Diff = as.integer(Date1 - first(Date1)))
    
    
    #      Id Date1       Cars  Diff
    #   <int> <date>     <int> <int>
    # 1     1 2007-04-05    72     0
    # 2     2 2014-01-07    12     0
    # 3     2 2018-07-09    10  1644
    # 4     2 2018-07-09    13  1644
    # 5     3 2005-11-19    22     0
    # 6     3 2005-11-23    13     4
    # 7     4 2010-06-17    38     0
    # 8     4 2010-09-23    57    98
    # 9     4 2010-09-23    41    98
    #10     4 2010-10-04    17   109
    

    data.table

    setDT(df)[, Diff := as.integer(Date1 - first(Date1)), Id]
    

    OR base R :

    df$diff <- with(df, ave(as.integer(Date1), Id, FUN = function(x) x - x[1]))
    

    Replace 0's to NA if you want output as such.