How to join tibbles/dataframes by using the 'nearest time' of a grouping column in R?

I have this dataset exemplified below:

library(lubridate)
library(tidyverse)

id <- c("A", "A", "B", "B")
date <- ymd(c("2017-11-26", "2017-11-26", "2017-11-26", "2017-11-29"))
time <- hms(c("09:25:30", "10:10:02", "09:15:36", "9:50:55"))
variable1 <- c("67", "30", "28", "90")
df <- tibble(id, date, time, variable1)
df

A tibble: 4 x 4
  id    date       time       variable1
  <chr> <date>     <Period>   <chr>    
1  A     2017-11-26 9H 25M 30S 67       
2  A     2017-11-26 10H 10M 2S 30       
3  B     2017-11-26 9H 15M 36S 28       
4  B     2017-11-29 9H 50M 55S 90

I need to add for each row in this dataset two new variables (latitude and longitude) based on a second dataset, using the columns "id", "date" and "time" as conditions to create the new columns.

However, the "time" is not exactly the same between the two datasets, and thus I need this condition to be an approximation (i.e. the nearest 'time' available for a given 'id' and 'date'). Another issue is that the second dataset has more rows, and thus, by selecting the 'nearest time', some rows will be ignored when adding to the first dataset.

Please see this example for the second dataset:

id <- c("A", "A", "A", "B", "B", "B")
date <- ymd(c("2017-11-26", "2017-11-26", "2017-11-30", "2017-11-26",         
"2017-11-26", "2017-11-29"))
time <- hms(c("09:00:00", "10:00:00", "08:00:00", "09:00:00", 
"13:00:00", "10:00:00"))
lat <- c(-30.6456, -29.5648, -27.6667, -31.5587, -30.6934, -29.3147)
long <- c(-50.4879, -49.8715, -51.8716, -50.4456, -50.9842, -51.9787)
df2 <- tibble(id, date, time, lat, long)
df2

# A tibble: 6 x 5
 id    date       time        lat  long
 <chr> <date>     <Period>  <dbl> <dbl>
1  A   2017-11-26 9H 0M 0S  -30.6 -50.5
2  A   2017-11-26 10H 0M 0S -29.6 -49.9
3  A   2017-11-30 8H 0M 0S  -27.7 -51.9
4  B   2017-11-26 9H 0M 0S  -31.6 -50.4
5  B   2017-11-26 13H 0M 0S -30.7 -51.0
6  B   2017-11-29 10H 0M 0S -29.3 -52.0

Therefore, I first need to group by "id", then by "date", and then extract the "lat" and "long" in the dataset 2 for the nearest "time" in the dataset 1, creating the two new columns in the dataset 1.

The resulting dataset would look like this:

df_output
# A tibble: 4 x 6
  id    date       time       variable1   lat  long
 <chr> <date>     <Period>   <chr>     <dbl> <dbl>
1  A    2017-11-26 9H 25M 30S 67      -30.6 -50.5
2  A    2017-11-26 10H 10M 2S 30      -29.6 -49.9
3  B    2017-11-26 9H 15M 36S 28      -31.6 -50.4
4  B    2017-11-29 9H 50M 55S 90      -29.3 -52.0

I've tried with group_by(), map2_dfr(), merge(), left_join(), case_when(), etc.. but couldn't find how to do this.

Solution

You may use data.table like

library(data.table)
df <- df %>%
  mutate(time = paste(date, time) %>% ymd_hms()) %>%
  select(-date)
df2 <- df2 %>%
  mutate(time = paste(date, time) %>% ymd_hms()) %>%
  select(-date)


df <- data.table(df, key = c("id", "time"))
df2 <- data.table(df2, key = c("id", "time"))
df2[df, list(id, time, variable1, lat, long), roll = "nearest"]

   id                time variable1      lat     long
1:  A 2017-11-26 09:25:30        67 -30.6456 -50.4879
2:  A 2017-11-26 10:10:02        30 -29.5648 -49.8715
3:  B 2017-11-26 09:15:36        28 -31.5587 -50.4456
4:  B 2017-11-29 09:50:55        90 -29.3147 -51.9787

I'm not sure about variable1's condition, why it's different between df and df_output in your question.