Search code examples
rdateequality

Compare two dates in R


I have a tab-separated text file that I imported to R. I used the following command for the import:

data = read.table(soubor, header = TRUE, sep = "\t", dec = ".", colClasses =c("numeric","numeric","character","Date","numeric","numeric"))

When I run str(data) to check the data-types of my columns I get:

'data.frame':   211931 obs. of  6 variables:
$ DataValue   : num  0 0 0 0 0 0 0 0 0 NA ...
$ SiteID      : num  1 1 1 1 1 1 1 1 1 1 ...
$ VariableCode: chr  "Sucho" "Sucho" "Sucho" "Sucho" ...
$ DateTimeUTC : Date, format: "2012-07-01" "2012-07-02" "2012-07-03" "2012-07-04" ...
$ Latitude    : num  50.8 50.8 50.8 50.8 50.8 ...
$ Longitude   : num  15.6 15.6 15.6 15.6 15.6 ...

A reproducible sample of the first 20 rows of my data is here:

my_sample <- data.frame(
  DataValue = rep(c(0, NA, 0), c(9L, 8L, 3L)),
  SiteID = rep(1, 20L),
  VariableCode = rep("Sucho", 20L),
  DateTimeUTC = as.Date(c(
    "2012-07-01", "2012-07-02", "2012-07-03", "2012-07-04", "2012-07-05",
    "2012-07-06", "2012-07-07", "2012-07-08", "2012-07-09", "2012-07-10",
    "2012-07-11", "2012-07-12", "2012-07-13", "2012-07-14", "2012-07-15",
    "2012-07-16", "2012-07-17", "2012-07-18", "2012-07-19", "2012-07-20"
  )),
  Latitude = rep(50.77, 20L),
  Longitude = rep(15.55, 20L)
)

Now I want to filter my table by the date. Note that I'm running my code inside a for loop. First, I subset my data by 1st July 2012 and do some processing. Then, I subset my data by 2nd July and do some processing, and so on.. For example, I want to get all rows with date equal to 6th July 2012. I tried the code:

startDate = as.Date("2012-07-01");
endDate = as.Date("2012-07-20");
all_dates = seq(startDate, endDate, 1);

#the following code I'm trying to run inside a loop...
for (j in 1:length(all_dates)) {
    filterdate = all_dates[j];
    my_subset = my_sample[my_sample$DateTimeUTC == filterdate,]
    #now I want do do some processing on my_subset...
}

But the above code returns an empty dataset starting from step 7 of the loop.

So, for example:

subset_one = my_sample[my_sample$DateTimeUTC == all_dates[6],]

returns: 3 obs of 6 variables.

But, for some unknown reason, the example:

subset_two = my_sample[my_sample$DateTimeUTC == all_dates[7],]

returns: 0 obs of 6 variables.

(note: I edited the above code to make my problem 100% reproducible)

Any ideas what I'm doing wrong?


Solution

  • The following solution solved my problem: Instead of using the Date data type, I tried to use the POSIXct data type. Here is the example code for reading the tab-separated textfile after which the subsetting worked in all steps of my for loop:

    data = read.table("data.txt", header = TRUE, sep = "\t", dec = ".", 
        colClasses =c("numeric","numeric","character","POSIXct","numeric","numeric"));
    startDate = as.POSIXct("2012-07-01");
    endDate = as.POSIXct("2012-07-20");
    all_dates = seq(startDate, endDate, 86400); #86400 is num of seconds in a day
    
    #the following code I'm trying to run inside a loop...
    for (j in 1:length(all_dates)) {
        filterdate = all_dates[j];
        my_subset = data[data$DateTimeUTC == filterdate,]
        #now I want do do some processing on my_subset...
    }