Dear StackOverflow users,
I am struggling to implement a for loop. I have a dataframe with a column Time (YMD-HMS format) and another column with particulate matter data. Furthermore, I have a dataframe with start and stop moments;
#TIMEPOINTS log
start <- c(ymd_hms("2020-03-06 19:43:00",
"2020-03-06 19:47:00",
"2020-03-06 19:53:00",
"2020-03-06 20:00:00",
"2020-03-06 20:13:00",
"2020-03-06 20:22:00",
"2020-03-06 20:32:00",
"2020-03-06 20:36:00",
"2020-03-06 20:42:00",
"2020-03-06 20:45:00",
"2020-03-06 20:49:00",
"2020-03-06 21:01:00",
"2020-03-06 21:04:00",
"2020-03-06 21:06:00",
"2020-03-06 21:09:00",
"2020-03-06 21:12:00"))
end <- c(ymd_hms("2020-03-06 19:46:00",
"2020-03-06 19:49:00",
"2020-03-06 19:55:00",
"2020-03-06 20:02:00",
"2020-03-06 20:15:00",
"2020-03-06 20:24:00",
"2020-03-06 20:34:00",
"2020-03-06 20:38:00",
"2020-03-06 20:44:00",
"2020-03-06 20:47:00",
"2020-03-06 20:51:00",
"2020-03-06 21:03:00",
"2020-03-06 21:06:00",
"2020-03-06 21:08:00",
"2020-03-06 21:11:00",
"2020-03-06 21:14:00"))
df <- data.frame(start, end)
I wish to create a new dataframe with all datapoints without these specific timepoints, like this; (but than using a forloop, iterating over the various starting and end points).
dat2 <- dat %>% .[.[["Time"]] >= df$start[1],] %>%
.[.[["Time"]] <= df$end[1],]
I know this can be done using a for loop and I tried to figure it out for my case, but I'm a bit lost..
Any help is highly appreciated!
To start with, I’d clean up your current code slightly:
dat2 <- dat %>% .[.$Time >= df$start[1] && .$Time <= df$end[1],]
By using &&
, you’ve reduced two subset operations into one. And using $…
reduces clutter compared to [["…"]]
in this case.
Next, I suggest extracting this condition into a function (in fact that function already exists in the ‘dplyr’ package: between
). This allows us to write the code as
dat2 <- dat %>% filter(between(Time, df$start[1], df$end[1]))
Now we want to vectorise this to check for overlap with any interval:
dat2 <- dat %>% filter(between_any(Time, df$start, df$end))
Now we need to write that between_any
function. Let’s start by implementing it for a single query value:
between_any1 = function (x, left, right) {
any(x >= left & x <= right)
}
Note the use of &
here, instead of &&
; this is because we vectorised over left
and right
, and &
is the vectorised version of &&
. That is, 4 >= (1 : 3) & 4 <= (3 : 5)
results in c(FALSE, TRUE, TRUE)
.
Now we need to make this work when x
is a vector. We could use the base R function Vectorize
but I generally find it better to do it manually:
between_any = function (x, left, right) {
map_lgl(x, ~ any(.x >= left & .x <= right))
}
This uses ‘purrr’, but we could just as well have used lapply
or vapply
.
Oh, and it sounds like you wanted to filter out times falling into the ranges in your df
, so you need to invert the condition for filter
:
dat2 <- dat %>% filter(! between_any(Time, df$from, df$to))