I have a dataframe with monitorings of some variables and each register is associated with a key1. However, I want to relate this variables with others which are associated with a different new key2. To do so, I have a table which shows the relation between both keys.
This is simple, I guess I only need to create a new column adding the new key2 to the dataframe with mutate(), but the tricky part is that more than one value of key2 can be related with the same key1. Because key1 means customers, and key2 means events. Luckely, the table with shows dependencies of both key1 and key2, shows a range of time as well. So I could be able to distinguish which key2 I have to mutate in my dataframe using the time of the registers and the intervals of the events. I don't know how to do it though.
I provide here a simpler example, 12 samples of register associated with 3 customers (01,02,03), which corresponds to 4 events (A,B,C,D).
library(lubridate)
library(tidyverse)
set.seed(123)
df1 <- data.frame(
Key1 = c('01','01','01','02','02','02','02','02','02','03','03','03'),
type = c('R','R','R','C','C','R','C','R','R','C','C','R'),
value = dnorm(1:12,mean=5,sd=10),
hour = c(now(),now()+hours(1),now()+hours(2),now()-hours(338),now()-hours(337),
now()-hours(336), now()-hours(48),now()-hours(47),now()-hours(46),
now(),now()+hours(20),now()+hours(30))
)
df2 <- data.frame(
key2 = c('A','B','C','D'),
Key1 = c('02','02','01','03'),
time = c(interval(now()-hours(340),now()-hours(299)),
interval(now()-hours(60),now()-hours(46)),
interval(now()-hours(10),now()+hours(10)),
interval(now()-hours(5),now()+hours(50))))
)
I would like to know if there's a way to create a new column adding the key2 of the event which the hour of the register lies in for every sample in df1. I know how to do it with a for loop, but I want to learn a way to do it with tydyverse. Thanks!
You can do this:
df1 %>%
left_join(df2 %>% pivot_wider(values_from = time,
names_from = c(key2)),
by = "Key1") %>%
pivot_longer(cols = df2$key2,
values_to = "values_df2",
names_to = "key2") %>%
filter(!is.na(values_df2) & hour %within% values_df2)
which gives:
# A tibble: 12 x 6
Key1 type value hour key2 values_df2
<chr> <chr> <dbl> <dttm> <chr> <Interval>
1 01 R 0.0368 2021-10-25 11:34:35 C 2021-10-25 01:34:38 CEST--2021-10-25 21:34:38 CEST
2 01 R 0.0381 2021-10-25 12:34:35 C 2021-10-25 01:34:38 CEST--2021-10-25 21:34:38 CEST
3 01 R 0.0391 2021-10-25 13:34:35 C 2021-10-25 01:34:38 CEST--2021-10-25 21:34:38 CEST
4 02 C 0.0397 2021-10-11 09:34:35 A 2021-10-11 07:34:38 CEST--2021-10-13 00:34:38 CEST
5 02 C 0.0399 2021-10-11 10:34:35 A 2021-10-11 07:34:38 CEST--2021-10-13 00:34:38 CEST
6 02 R 0.0397 2021-10-11 11:34:35 A 2021-10-11 07:34:38 CEST--2021-10-13 00:34:38 CEST
7 02 C 0.0391 2021-10-23 11:34:35 B 2021-10-22 23:34:38 CEST--2021-10-23 13:34:38 CEST
8 02 R 0.0381 2021-10-23 12:34:35 B 2021-10-22 23:34:38 CEST--2021-10-23 13:34:38 CEST
9 02 R 0.0368 2021-10-23 13:34:35 B 2021-10-22 23:34:38 CEST--2021-10-23 13:34:38 CEST
10 03 C 0.0352 2021-10-25 11:34:35 D 2021-10-25 06:34:38 CEST--2021-10-27 13:34:38 CEST
11 03 C 0.0333 2021-10-26 07:34:35 D 2021-10-25 06:34:38 CEST--2021-10-27 13:34:38 CEST
12 03 R 0.0312 2021-10-26 17:34:35 D 2021-10-25 06:34:38 CEST--2021-10-27 13:34:38 CEST
Obviously, if you only need to add the key, you can get rid of the last column by select(-values_df2)
, which then gives:
# A tibble: 12 x 5
Key1 type value hour key2
<chr> <chr> <dbl> <dttm> <chr>
1 01 R 0.0368 2021-10-25 11:34:35 C
2 01 R 0.0381 2021-10-25 12:34:35 C
3 01 R 0.0391 2021-10-25 13:34:35 C
4 02 C 0.0397 2021-10-11 09:34:35 A
5 02 C 0.0399 2021-10-11 10:34:35 A
6 02 R 0.0397 2021-10-11 11:34:35 A
7 02 C 0.0391 2021-10-23 11:34:35 B
8 02 R 0.0381 2021-10-23 12:34:35 B
9 02 R 0.0368 2021-10-23 13:34:35 B
10 03 C 0.0352 2021-10-25 11:34:35 D
11 03 C 0.0333 2021-10-26 07:34:35 D
12 03 R 0.0312 2021-10-26 17:34:35 D