Search code examples
rdataframedatelubridate

FOR Loop with multiple parameters from a dataframe in R


I would like to know whether it is possible to build a FOR loop in R which would change multiple parameters at every run.

I have parameter dataframe [df_params] which looks like this:

group   person   date_from             date_to
1        Mike    2020-10-01 12:00:00   2020-10-01 13:00:00
2        Mike    2020-10-04 09:00:00   2020-10-07 17:00:00
3        Dave    2020-10-07 12:00:00   2020-10-07 13:00:00
4        Dave    2020-10-09 09:00:00   2020-10-11 17:00:00

I would like to loop over a larger dataframe [df] and get only the rows matching parameters of individual rows in the "df_params" dataframe.

The large dataframe [df] looks like this:

person    datetime              books  tasks  done
Mike      2020-10-01 12:15:00   5      7      2 
Mike      2020-10-01 12:17:00   5      7      3 
Mike      2020-10-01 18:00:00   5      7      4 
Mike      2020-10-02 12:00:00   5      5      0 
Mike      2020-10-04 09:08:00   5      3      3 
Mike      2020-10-09 12:00:00   5      7      1 
Dave      2020-10-07 12:22:00   7      5      1 
Dave      2020-10-08 02:34:00   7      5      2 
Dave      2020-10-09 07:00:00   7      3      3 
Dave      2020-10-09 08:00:00   7      8      5 
Dave      2020-10-09 09:48:00   7      7      2 
Nick      2020-10-01 13:00:00   3      7      3 
Nick      2020-10-02 12:58:00   3      3      2 
Nick      2020-10-03 10:02:00   3      7      1 

The desired result would look like this:

person    datetime              books  tasks  done   group
Mike      2020-10-01 12:15:00   5      7      2      1
Mike      2020-10-01 12:17:00   5      7      3      1
Mike      2020-10-04 09:08:00   5      3      3      2
Dave      2020-10-07 12:22:00   7      5      1      3
Dave      2020-10-09 09:48:00   7      7      2      4

Is something like this possible in R. Thank you very much for any suggestions.


Solution

  • This might be a slightly expensive solution if your datasets are very large, but it outputs the desired result.

    I don't know if your date variables are already in date format; below I convert them with the lubridate package just in case they aren't.
    Also, I create the variable date_interval that will be used later for a filtering condition.

    library(dplyr)
    library(lubridate)
    
    # convert to date format
    df_params <- df_params %>% 
      mutate(
        date_from = ymd_hms(date_from),
        date_to = ymd_hms(date_to),
        # create interval
        date_interval = interval(date_from, date_to)
        )
    
    df <- df %>% 
      mutate(datetime = ymd_hms(datetime))
    

    After this manipulation step, I use a left_join on the person name in order to have a larger dataframe - for this reason I said before that this operation might be a little expensive - and then filter only the rows where datetime is within the above-mentioned interval.

    left_join(df, df_params, by = "person") %>% 
      filter(datetime %within% date_interval) %>% 
      select(person:group)
    #   person            datetime books tasks done group
    # 1   Mike 2020-10-01 12:15:00     5     7    2     1
    # 2   Mike 2020-10-01 12:17:00     5     7    3     1
    # 3   Mike 2020-10-04 09:08:00     5     3    3     2
    # 4   Dave 2020-10-07 12:22:00     7     5    1     3
    # 5   Dave 2020-10-09 09:48:00     7     7    2     4
    

    Starting data

    df_params <- read.table(text="
    group    person  date_from             date_to
    1        Mike    2020-10-01T12:00:00   2020-10-01T13:00:00
    2        Mike    2020-10-04T09:00:00   2020-10-07T17:00:00
    3        Dave    2020-10-07T12:00:00   2020-10-07T13:00:00
    4        Dave    2020-10-09T09:00:00   2020-10-11T17:00:00", header=T)
    
    df <- read.table(text="
    person    datetime              books  tasks  done
    Mike      2020-10-01T12:15:00   5      7      2 
    Mike      2020-10-01T12:17:00   5      7      3 
    Mike      2020-10-01T18:00:00   5      7      4 
    Mike      2020-10-02T12:00:00   5      5      0 
    Mike      2020-10-04T09:08:00   5      3      3 
    Mike      2020-10-09T12:00:00   5      7      1 
    Dave      2020-10-07T12:22:00   7      5      1 
    Dave      2020-10-08T02:34:00   7      5      2 
    Dave      2020-10-09T07:00:00   7      3      3 
    Dave      2020-10-09T08:00:00   7      8      5 
    Dave      2020-10-09T09:48:00   7      7      2 
    Nick      2020-10-01T13:00:00   3      7      3 
    Nick      2020-10-02T12:58:00   3      3      2 
    Nick      2020-10-03T10:02:00   3      7      1 ", header=T)