I would like to know whether it is possible to build a FOR loop in R which would change multiple parameters at every run.
I have parameter dataframe [df_params] which looks like this:
group person date_from date_to
1 Mike 2020-10-01 12:00:00 2020-10-01 13:00:00
2 Mike 2020-10-04 09:00:00 2020-10-07 17:00:00
3 Dave 2020-10-07 12:00:00 2020-10-07 13:00:00
4 Dave 2020-10-09 09:00:00 2020-10-11 17:00:00
I would like to loop over a larger dataframe [df] and get only the rows matching parameters of individual rows in the "df_params" dataframe.
The large dataframe [df] looks like this:
person datetime books tasks done
Mike 2020-10-01 12:15:00 5 7 2
Mike 2020-10-01 12:17:00 5 7 3
Mike 2020-10-01 18:00:00 5 7 4
Mike 2020-10-02 12:00:00 5 5 0
Mike 2020-10-04 09:08:00 5 3 3
Mike 2020-10-09 12:00:00 5 7 1
Dave 2020-10-07 12:22:00 7 5 1
Dave 2020-10-08 02:34:00 7 5 2
Dave 2020-10-09 07:00:00 7 3 3
Dave 2020-10-09 08:00:00 7 8 5
Dave 2020-10-09 09:48:00 7 7 2
Nick 2020-10-01 13:00:00 3 7 3
Nick 2020-10-02 12:58:00 3 3 2
Nick 2020-10-03 10:02:00 3 7 1
The desired result would look like this:
person datetime books tasks done group
Mike 2020-10-01 12:15:00 5 7 2 1
Mike 2020-10-01 12:17:00 5 7 3 1
Mike 2020-10-04 09:08:00 5 3 3 2
Dave 2020-10-07 12:22:00 7 5 1 3
Dave 2020-10-09 09:48:00 7 7 2 4
Is something like this possible in R. Thank you very much for any suggestions.
This might be a slightly expensive solution if your datasets are very large, but it outputs the desired result.
I don't know if your date variables are already in date format; below I convert them with the lubridate
package just in case they aren't.
Also, I create the variable date_interval
that will be used later for a filtering condition.
library(dplyr)
library(lubridate)
# convert to date format
df_params <- df_params %>%
mutate(
date_from = ymd_hms(date_from),
date_to = ymd_hms(date_to),
# create interval
date_interval = interval(date_from, date_to)
)
df <- df %>%
mutate(datetime = ymd_hms(datetime))
After this manipulation step, I use a left_join
on the person name in order to have a larger dataframe - for this reason I said before that this operation might be a little expensive - and then filter
only the rows where datetime
is within the above-mentioned interval.
left_join(df, df_params, by = "person") %>%
filter(datetime %within% date_interval) %>%
select(person:group)
# person datetime books tasks done group
# 1 Mike 2020-10-01 12:15:00 5 7 2 1
# 2 Mike 2020-10-01 12:17:00 5 7 3 1
# 3 Mike 2020-10-04 09:08:00 5 3 3 2
# 4 Dave 2020-10-07 12:22:00 7 5 1 3
# 5 Dave 2020-10-09 09:48:00 7 7 2 4
Starting data
df_params <- read.table(text="
group person date_from date_to
1 Mike 2020-10-01T12:00:00 2020-10-01T13:00:00
2 Mike 2020-10-04T09:00:00 2020-10-07T17:00:00
3 Dave 2020-10-07T12:00:00 2020-10-07T13:00:00
4 Dave 2020-10-09T09:00:00 2020-10-11T17:00:00", header=T)
df <- read.table(text="
person datetime books tasks done
Mike 2020-10-01T12:15:00 5 7 2
Mike 2020-10-01T12:17:00 5 7 3
Mike 2020-10-01T18:00:00 5 7 4
Mike 2020-10-02T12:00:00 5 5 0
Mike 2020-10-04T09:08:00 5 3 3
Mike 2020-10-09T12:00:00 5 7 1
Dave 2020-10-07T12:22:00 7 5 1
Dave 2020-10-08T02:34:00 7 5 2
Dave 2020-10-09T07:00:00 7 3 3
Dave 2020-10-09T08:00:00 7 8 5
Dave 2020-10-09T09:48:00 7 7 2
Nick 2020-10-01T13:00:00 3 7 3
Nick 2020-10-02T12:58:00 3 3 2
Nick 2020-10-03T10:02:00 3 7 1 ", header=T)