Create columns based on date filter within group

I have the following data:

> dput(my_data)
structure(list(id = c(1, 1, 1, 2, 2, 3, 3), begin = c("2017-01-01", 
"2017-08-01", "2022-05-01", "2017-01-01", "2017-09-01", "2017-01-01", 
"2017-09-01"), end = c("2017-07-01", "2022-04-01", "2023-06-01", 
"2017-08-01", "2023-06-01", "2017-08-01", "2023-06-01"), position = c("position_1", 
"position_2", "position_3", "position_1", "position_2", "position_1", 
"position_1")), row.names = c(NA, -7L), class = "data.frame")


 id      begin        end   position
1  1 2017-01-01 2017-07-01 position_1
2  1 2017-08-01 2022-04-01 position_2
3  1 2022-05-01 2023-06-01 position_3
4  2 2017-01-01 2017-08-01 position_1
5  2 2017-09-01 2023-06-01 position_2
6  3 2017-01-01 2017-08-01 position_1
7  3 2017-09-01 2023-06-01 position_1

What I want: I am trying to create a new column (e.g. first_position_in_2023) that contains the first position for each id that was present in 2023-01-01. (The first position with a timestamp begin-end that includes 2023-01-01.

So for my Data:

id == 1: position_3
id == 2: position_2
id == 2: position_1

Ideally I would want to be able to create multiple columns like:

first_position_in_2017
first_position_in_2018
first_position_in_2019 ...

What I tried:


my_data %>%
  group_by(id) %>%
  summarise(first_position_in_2023 = position[begin <= (as.Date("2023-01-01"))])


# A tibble: 7 x 2
# Groups:   id [3]
     id first_position_in_2023
  <dbl> <chr>                 
1     1 position_1            
2     1 position_2            
3     1 position_3            
4     2 position_1            
5     2 position_2            
6     3 position_1            
7     3 position_1

This seems to give me just the result per Row and not the first result for each group.

Solution

Since you include all the tags, you could try this dplyr/tidyr/lubridate approach for the broader goal of doing it for each year:

library(dplyr)
library(tidyr)

df %>%
  mutate(year = lubridate::year(end)) %>% select(-c(begin, end)) %>%
  pivot_wider(names_from = year, values_from = position, 
              names_glue = "first_position_in_{year}")

Output:

    id first_position_in_2017 first_position_in_2022 first_position_in_2023
  <dbl> <chr>                  <chr>                  <chr>                 
1     1 position_1             position_2             position_3            
2     2 position_1             NA                     position_2            
3     3 position_1             NA                     position_1

Per your comment, if you wanted to create columns for years that are not in the data (ie, 2019, 2020, etc), you could use complete just before pivot_wider:

df %>%
  mutate(year = lubridate::year(end)) %>% select(-c(begin, end)) %>%
  complete(year = min(year):max(year)) %>%
  pivot_wider(names_from = year, values_from = position, 
              names_glue = "first_position_in_{year}") %>% filter(!is.na(id))

Output

     id first_position_in_2017 first_position_in_2018 first_position_in_2019 first_position_in_2020 first_position_in_2021 first_position_in_2022 first_position_in_2023
  <dbl> <chr>                  <chr>                  <chr>                  <chr>                  <chr>                  <chr>                  <chr>                 
1     1 position_1             NA                     NA                     NA                     NA                     position_2             position_3            
2     2 position_1             NA                     NA                     NA                     NA                     NA                     position_2            
3     3 position_1             NA                     NA                     NA                     NA                     NA                     position_1