Search code examples
rexceldatedate-conversionas.date

Trouble converting and separating dates in R


I have a dataset containing some dates listed in the CSV as dd/mm/yy and my final aim is to separate out the year column into a separate column, however it seems to be giving me some very odd results.

When the CSV is input into R, the dataset (dframe1) shows the dates as factors automatically e.g. 8/3/05 but they aren't all in this format, some are 8/3/2005 and some are 08/03/2005 (as they should be) which means that when I convert them, some of the dates are coming up like they should (i.e., 2004-11-5) and some are coming out really strange (i.e., 0004-11-5).. I checked on the excel spreadsheet and they are all formatted exactly the same, written in the same etc., and the code is exactly the same for converting, but it just doesn't seem to work.

Here is an example of what it looks like what I import into R, even though in excel they are all the same:

2   11/11/04
3   11/11/04
4   11/11/04
5   11/11/04
6   11/11/04
7   11/11/04
8   11/11/04
9   11/11/04
10  5/3/05
11  5/3/05
12  5/3/05
13  5/3/05
14  5/3/05
15  5/3/05
16  5/3/05
17  3/11/05
18  3/11/05
19  3/11/05
20  3/11/05
21  3/11/05
22  3/11/05
23  3/11/05
24  3/11/05
25  3/11/05
26  3/11/05
27  9/6/06
28  9/6/06
29  9/6/06
30  9/6/06
31  9/6/06
32  9/6/06
33  9/6/06
34  9/6/06
35  11/8/06
36  11/8/06
37  11/8/06
38  11/8/06
39  11/8/06
40  11/8/06
41  11/8/06
42  11/8/06
43  22/02/2007
44  22/02/2007
45  22/02/2007
46  22/02/2007
47  22/02/2007
48  22/02/2007
49  22/02/2007
50  7/2/08
51  7/2/08
52  7/2/08
53  7/2/08
54  7/2/08
55  7/2/08
56  8/5/08
57  8/5/08
58  8/5/08
59  8/5/08
60  25/03/2012
61  25/03/2012
62  25/03/2012
63  25/03/2012
64  25/03/2012
65  25/03/2012

The code I am wanting to use to convert is as below:

data = dframe1

Set.date, Haul.date, Date.depart and Date.return are all columns within dframe1

First I make sure R knows that the date columns are actually dates (rather than factors, the automatic assumption)

dframe1$Set.date <- as.Date(dframe1$Set.date,"%d/%m/%Y")
dframe1$Haul.date <- as.Date(dframe1$Haul.date, format ="%d/%m/%Y")
dframe1$Date.depart <- as.Date(dframe1$Date.depart, format ="%d/%m/%Y")
dframe1$Date.return <- as.Date(dframe1$Date.return, format ="%d/%m/%Y")

Next I want to separate and add a column for year (can also do with day and month but won't bother this time)

dframe1$Set.year <- format(dframe1$Set.date[1], "%Y")
dframe1$Haul.year <- format(dframe1$Haul.date[1], "%Y")
dframe1$Year.depart <- format(dframe1$Date.depart[1], "%Y") 
dframe1$Year.return <- format(dframe1$Date.return[1], "%Y")

Solution

  • If all your dates are in the last 14 years, then this should work:

    dat$dat2 <- gsub("/([0-9]{2})$", "/20\\1", dat$date)
    library(lubridate)
    
    dat$dat3 <- dmy(dat$dat2)
    

    The gsub function will ignore teh items that do not have exactly 2 digits between a forward-slash and the end of the string. If you have some values in the 1900's then there should be ways of converting those. Perhaps:

    dat$dat2 <- gsub("/([7-9][0-9])$", "/19\\1", dat$date)