The yearmon
class from the zoo
package seems particularly useful for working with data that only has a year and month, but no day or time. However, I find it difficult to work with, particularly when trying to filter out a range of dates.
yearmon
is kind of a weird data class. It displays dates like Jun 2002
. This is more aesthetically pleasing, but (correct me if I'm wrong) underneath it's the year and a fraction of the month, with January being 0
and December being 11
. So, Jun 2002
, for example, would actually be 2002.417
(i.e., 5/12) under the hood.
Let's assume now that you have monthly data over a ten year period (2000-2009), kinda like this.
date value
<yearmon> <dbl>
1 Jan 2000 10000
2 Feb 2000 30000
3 Mar 2000 250
4 Apr 2000 20
5 May 2000 50000
6 Jun 2000 -90042
7 Jul 2000 73400
8 Aug 2000 4317
9 Sep 2000 1000
10 Oct 2000 -22
You want to filter out rows that fall between June 2002 and November 2003 and keep everything else in your data set.
%in%
is good for keeping only a specific range of values, but I often find taking the inverse of this function just as useful, like the following.
%!in% <- Negate(%in%)
My understanding is that you can't work with what yearmon
displays. Meaning, you can't actually reference dates like "Jun 2003"
, but instead you need to work with what's stored under the hood (i.e., 2002.417
). I would think it'd be sufficient to use filter()
from dplyr
to filter out the date range through something like the following
my_data |>
filter(date %!in% c(2002.417:2003.833))
However, I find that this doesn't seem to work, as the months from June 2002 to November 2003 won't be removed. I think it might have something to do with the colon not working with yearmon
somehow? I tried only removing June 2002 (i.e., removed the colon and 2003.833) and it was able to remove that single date. However, I find it doesn't like it when you try to specify a range of dates to remove. If the dates were numeric, say, written as 200206
and 200311
, I've found that running my above code works in removing that period.
Does anyone know how to filter out a range of dates in class yearmon
? I'm surprised that there doesn't seem to be much stuff about filtering out date ranges (usually I find online tutorials only show how to keep a specific range, rather than exclude).
Also, because I don't see a lot of discussion on it either, what are peoples thoughts on the yearmon
class from zoo
? Do you find it useful? Many tutorials only seem to use yearmon
for extracting month or year into a separate column. Am I wrong from keeping dates with only a month and year in class yearmon
? Is it better to store them as class date
and just add 01
at the end for the day so it's like 2002-06-01
?
I've tried using !=, instead of %!in%. I also tried feeding in c("Jun 2002": "Nov 2003") but I get an NA/NAN error message. I've also tried converting it to a date
class and using c("2002-06-01:"2003-11-01") but I think I also got a NA/NAN error, even after dropping the quotation marks and swapping the hyphens for slashes c("2002/06/01:"2003/11/01"). I also tried dropping the concatenate sign and brackets in case the numeric date values were sufficient, but that didn't work either. I also tried playing with the Negate of the between()
function, but found that it didn't work and was making the code more complicated than it ought to be.
I have never used yearmon
before, but I believe your error is stemming from the use of :
. Using a colon like 2000:2003
will produce integer values from 2000 to 2003.
Using non-integer values increments by 1 until the end point is exceeded. This would not produce the desired matches in your situation.
c(2002.417:2003.833)
[1] 2002.417 2003.417
The function you are looking for is seq()
.
x <- seq(from = 2002.417, to = 2003.833, by = 1/12)
x
[1] 2002.417 2002.500 2002.584 2002.667 2002.750 2002.834 2002.917 2003.000 2003.084
[10] 2003.167 2003.250 2003.334 2003.417 2003.500 2003.584 2003.667 2003.750
as.yearmon(x)
[1] "Jun 2002" "Jul 2002" "Aug 2002" "Sep 2002" "Oct 2002" "Nov 2002" "Dec 2002"
[8] "Jan 2003" "Feb 2003" "Mar 2003" "Apr 2003" "May 2003" "Jun 2003" "Jul 2003"
[15] "Aug 2003" "Sep 2003" "Oct 2003"
You'll notice Nov 2003
(or 2003+((11-1)/12) = 2003.834
) is missing. This is because seq()
cannot exceed the to
parameter. Increasing the to
parameter by a small amount will remedy that issue.
Using the appropriate seq()
call in place of c()
in your filter()
should return the correct subset of data.