Based on this question and answer, that is close to satisfying my needs: R - how to make long rows by matching an element of original rows
df <- data.table::data.table(ID = c(123,123,123,234,456, 234),
event_type = c("Museum","Concert","Park","Garden","Museum", "Museum"),
event_info = c("5496:C", "5434:D", "1236:C", "534126:A", "5496:C", "0129:Z"))
and the output I want is:
ID event_type.1 event_type.2 event_type.3 event_info.1 event_info.2 event_info.3
123 Museum Concert Park 5496:C 5434:D 1236:C
234 Garden Museum NA 534126:A 0129:Z NA
456 Garden NA NA 5496:C NA NA
In my real data, the min number of times an ID appears is 2x and the maximum is 4x.
per the solution linked, I tried aggregate
but also tried a couple of group_by
approaches but the issue is that I dont want to combine any columns or make new columns. I want to be able to see, for each ID
, what are the pairs (or more than pair) of event_type
while retaining the event_info
for each of the event_type
Thank you!
First I add a variable instance
to track how many times we have encountered a given ID
so far. Then I pivot wider using instance
and the two value columns. By default, the new column names will first show the value columns (i.e. event_type
or event_info
) and then _
and the instance
number.
library(tidyverse)
df |>
mutate(instance = row_number(), .by = ID) |>
pivot_wider(names_from = instance, values_from = c(event_type, event_info))
# add names_sep = "." to get the column name formatting in the OP
Result
# A tibble: 3 × 7
ID event_type_1 event_type_2 event_type_3 event_info_1 event_info_2 event_info_3
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 123 Museum Concert Park 5496:C 5434:D 1236:C
2 234 Garden Museum NA 534126:A 0129:Z NA
3 456 Museum NA NA 5496:C NA NA