Search code examples
rdplyrdatatabledatasetsimulation

Generate random date after a date


I have a dataset like this:

set.seed(123)
date_entry<- sample(seq(as.Date('2000-01-01'), as.Date('2010-01-01'), by="day"), 1000)
df <- data.frame( date_entry)
df <- df %>% mutate(id = row_number())

I want to to generate a random date_end column for each id that is greater than date_entry. For instance, for these dates, I want greater than 2006 for id=1:3 and 2002 for id=4.

    date_entry  id
1   2006-09-28   1
2   2006-11-15   2
3   2006-02-04   3
4   2001-06-09   4
5   2000-07-13   5

Solution

  • Create a daily sequence between date_entry and today's date (i.e., Sys.Date()), then pick 1 sample for date_end.

    library(tidyverse)
    
    df %>% 
      rowwise %>% 
      mutate(date_end = sample(seq(date_entry, Sys.Date(), by="day"), 1))
    

    Output

       date_entry    id date_end  
       <date>     <int> <date>    
     1 2006-09-28     1 2016-01-08
     2 2006-11-15     2 2019-04-27
     3 2006-02-04     3 2016-02-17
     4 2001-06-09     4 2012-12-26
     5 2000-07-13     5 2008-11-12
     6 2008-03-04     6 2011-12-27
     7 2005-01-15     7 2015-01-04
     8 2003-02-15     8 2020-07-28
     9 2009-03-24     9 2014-11-01
    10 2003-06-06    10 2004-03-22
    # … with 990 more rows