Search code examples
rdateformatfactors

Dates from factor to another format so I can find start and end dates per group


I'm pretty new to R and I can't figure my problem. I have been through countless forms.

My dataset looks like this: Glimpse of dataset

I want to find the first and last date of each Event and put it in a nice table. There are 26 events. However, the dates are in factor format which makes me unable to find the start and end date. And when I try to convert it to a numeric format I get NA for each value, and when I try to convert it to a date format it stays in factor format.

Can someone help me out?

As suggested I tried to find a way to share my dataset using dput. I tried it and I think this should work to get an 2x8 sample of my dataset.

df <- structure(list(`Release Date` = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Names = c("", 
                                                            "", "", "", "", 
                                                            "", "", ""), .Label = c("3/17/2020", "Release Date", "6/16/2020", "9/15/2020", 
                                                                            "12/16/2020", "12/17/2015", "6/17/2013", "9/17/2012", "6/14/2012", 
                                                                            "3/15/2012", "6/20/2011", "3/16/2011", "12/16/2010", "9/14/2010", 
                                                                            "6/16/2010", "3/17/2010", "12/15/2009", "9/15/2009", "6/16/2009", 
                                                                            "3/13/2009", "12/12/2008", "9/15/2008", "6/13/2008", "3/14/2008", 
                                                                            "12/13/2007", "9/12/2007", "6/14/2007", "3/15/2007", "12/14/2006", 
                                                                            "9/14/2006", "6/16/2006", "3/17/2006", "12/15/2005", "10/18/2005", 
                                                                            "9/21/2005", "7/15/2005", "6/21/2005", "4/15/2005", "3/15/2005", 
                                                                            "1/18/2005", "12/15/2004", "10/27/2004", "9/15/2004", "7/28/2004" 
                                                                            ), class = "factor"), Event = structure(c(2L, 
                                                                                                                                                                               2L, 2L, 2L, 3L, 3L, 3L, 3L), .Names = c("", "", "", "", "", "", "", ""), .Label = c("Event", "Labour Costs YoY", 
                                                                                                                                                                                                                                   "Unemployment Change (000's)", "Unemployment Rate", "Jobseekers Net Change"
                                                                                                                                                                               ), class = "factor")), row.names = c("X.1", "X.11",  "X.12", "X.13", "X.14", "X.15", "X.16", "X.17"), class = "data.frame")

Solution

  • After converting the dates to date object you can use min and max to get first and last dates for each event.

    library(dplyr)
    
    df %>%
      mutate(`Release Date` = as.Date(`Release Date`, '%m/%d/%Y')) %>%
      group_by(Event) %>%
      summarise(first_date = min(`Release Date`), 
                last_date = max(`Release Date`))
    
    #  Event                       first_date last_date 
    #  <fct>                       <date>     <date>    
    #1 Labour Costs YoY            2020-03-17 2020-12-16
    #2 Unemployment Change (000's) 2012-06-14 2015-12-17