I have a dataframe that can be simplified to this:
example <- data.frame(
PID = c(1091, 1358, 1087, 2358, 2087, 2091),
value = c('1','2', '1,4,6', '2', '2', '1')
)
I want to do a few things with this dataframe:
PID
s are grouped together by the last 1-3 digits, with value
spread out in wide format accordingly.value
for each PID
group, create a new column category
based on the following pairing:value_A
and value_B
are 1 and 2 (who's who does not matter, as long as the pairing is 1 and 2 for both PID
s), category
should be "mw"value_A
and value_B
are the same number (e.g. 1+1 or 2+2), category
should be "mm"category
This is the solution dataframe:
example_solution <- data.frame(
PID_A = c(1091, 1358, 1087),
PID_B = c(2091, 2358, 2087),
value_A = c('1','2', '1,4,6'),
value_B = c('1', '2', '2'),
category = c("mw", "mm", "other")
)
Does anyone have any insight on how I may achieve this?
Thank you!
library(tidyverse)
example %>%
separate_wider_position(PID, c(a=1, b = 3), cols_remove = FALSE)%>%
pivot_wider(names_from = a, values_from = c(PID, value),
names_glue = "{.value}_{c(`1` = 'A', `2` = 'B')[a]}") %>%
mutate(category = case_when(
value_A == value_B ~ 'mm',
value_A == 1 & value_B== 2 ~ 'mw',
value_A == 2 & value_B== 1 ~ 'mw',
.default = 'other'), b = NULL)
# A tibble: 3 × 5
PID_A PID_B value_A value_B category
<dbl> <dbl> <chr> <chr> <chr>
1 1091 2091 1 2 mw
2 1358 2358 2 2 mm
3 1087 2087 1,4,6 2 other