I have the following data:
> dput(my_data)
structure(list(id = c(1, 1, 1, 2, 2, 3, 3), begin = c("2017-01-01",
"2017-08-01", "2022-05-01", "2017-01-01", "2017-09-01", "2017-01-01",
"2017-09-01"), end = c("2017-07-01", "2022-04-01", "2023-06-01",
"2017-08-01", "2023-06-01", "2017-08-01", "2023-06-01"), position = c("position_1",
"position_2", "position_3", "position_1", "position_2", "position_1",
"position_1")), row.names = c(NA, -7L), class = "data.frame")
id begin end position
1 1 2017-01-01 2017-07-01 position_1
2 1 2017-08-01 2022-04-01 position_2
3 1 2022-05-01 2023-06-01 position_3
4 2 2017-01-01 2017-08-01 position_1
5 2 2017-09-01 2023-06-01 position_2
6 3 2017-01-01 2017-08-01 position_1
7 3 2017-09-01 2023-06-01 position_1
What I want:
I am trying to create a new column (e.g. first_position_in_2023
) that contains the first position for each id
that was present in 2023-01-01. (The first position with a timestamp begin-end that includes 2023-01-01.
So for my Data:
Ideally I would want to be able to create multiple columns like:
What I tried:
my_data %>%
group_by(id) %>%
summarise(first_position_in_2023 = position[begin <= (as.Date("2023-01-01"))])
# A tibble: 7 x 2
# Groups: id [3]
id first_position_in_2023
<dbl> <chr>
1 1 position_1
2 1 position_2
3 1 position_3
4 2 position_1
5 2 position_2
6 3 position_1
7 3 position_1
This seems to give me just the result per Row and not the first result for each group.
Since you include all the tags, you could try this dplyr
/tidyr
/lubridate
approach for the broader goal of doing it for each year:
library(dplyr)
library(tidyr)
df %>%
mutate(year = lubridate::year(end)) %>% select(-c(begin, end)) %>%
pivot_wider(names_from = year, values_from = position,
names_glue = "first_position_in_{year}")
Output:
id first_position_in_2017 first_position_in_2022 first_position_in_2023
<dbl> <chr> <chr> <chr>
1 1 position_1 position_2 position_3
2 2 position_1 NA position_2
3 3 position_1 NA position_1
Per your comment, if you wanted to create columns for years that are not in the data (ie, 2019, 2020, etc), you could use complete
just before pivot_wider
:
df %>%
mutate(year = lubridate::year(end)) %>% select(-c(begin, end)) %>%
complete(year = min(year):max(year)) %>%
pivot_wider(names_from = year, values_from = position,
names_glue = "first_position_in_{year}") %>% filter(!is.na(id))
Output
id first_position_in_2017 first_position_in_2018 first_position_in_2019 first_position_in_2020 first_position_in_2021 first_position_in_2022 first_position_in_2023
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 position_1 NA NA NA NA position_2 position_3
2 2 position_1 NA NA NA NA NA position_2
3 3 position_1 NA NA NA NA NA position_1