I would like to track min and max occurrences in two columns. This should be done in rolling fashion from beginning of the data, so we can track how many times overall IDs are present at each date. Also it doesn't matter in which column ID is present.
Result should be as follows. Row 1, nor B or C has occurred, so min_appearance is 0 but max_appearance is 1 as A and D was present. Row 5 A and D have been present 3 times at this point but B and C only 2. I'm not concerned which ID is present, but only on counts what is min and max. Also real data is more complicated, so pairs are not static, but A could face C and so on.
# A tibble: 8 x 5
date id1 id2 min_appearances max_appearances
<date> <chr> <chr> <dbl> <dbl>
1 2020-01-01 A D 0 1
2 2020-01-02 B C 1 1
3 2020-01-03 C B 1 2
4 2020-01-04 D A 2 2
5 2020-01-05 A D 2 3
6 2020-01-06 B C 3 3
7 2020-01-07 C B 3 4
8 2020-01-08 D A 4 4
DATA:
library(dplyr)
date <- seq(as.Date("2020/1/1"), by = "day", length.out = 8)
id1 <- rep(c("A", "B", "C", "D"), 2)
id2 <- rep(c("D", "C", "B", "A"), 2)
dt <- tibble(date = date,
id1 = id1,
id2 = id2)
Here's a way to do it using functions from the tidyverse
. First, pivot_longer to handle more easily the data. Then compute the cumulative count of value for every unique ids
. Compute the min and max for each row over the "count" columns. Finally, take the last min and max values for each pairs, and pivot back to wide.
library(tidyverse)
dt %>%
pivot_longer(cols = -date, values_to = "id") %>%
mutate(map_dfc(unique(id), ~ tibble("count_{.x}" := cumsum(id == .x)))) %>%
mutate(min_appearances = do.call(pmin, select(., starts_with("count"))),
max_appearances = do.call(pmax, select(., starts_with("count")))) %>%
group_by(date) %>%
mutate(across(min_appearances:max_appearances, last),
n = row_number()) %>%
pivot_wider(c(date, min_appearances, max_appearances), names_from = n, values_from = id, names_prefix = "id") %>%
relocate(order(colnames(.)))
date id1 id2 max_appearances min_appearances
<date> <chr> <chr> <int> <int>
1 2020-01-01 A D 1 0
2 2020-01-02 B C 1 1
3 2020-01-03 C B 2 1
4 2020-01-04 D A 2 2
5 2020-01-05 A D 3 2
6 2020-01-06 B C 3 3
7 2020-01-07 C B 4 3
8 2020-01-08 D A 4 4