Search code examples
rdplyr

Make dataframe into wide format and group columns by pairing


I have a dataframe that can be simplified to this:

example <- data.frame(
  PID = c(1091, 1358, 1087, 2358, 2087, 2091),
  value = c('1','2', '1,4,6', '2', '2', '1')
)

I want to do a few things with this dataframe:

  1. make this dataframe into a wide format where PIDs are grouped together by the last 1-3 digits, with value spread out in wide format accordingly.
  2. Then, based on the pairing of value for each PID group, create a new column category based on the following pairing:
  • if value_A and value_B are 1 and 2 (who's who does not matter, as long as the pairing is 1 and 2 for both PIDs), category should be "mw"
  • if value_A and value_B are the same number (e.g. 1+1 or 2+2), category should be "mm"
  • all other value pairings should give "other" in category

This is the solution dataframe:

example_solution <- data.frame(
  PID_A = c(1091, 1358, 1087),
  PID_B = c(2091, 2358, 2087),
  value_A = c('1','2', '1,4,6'),
  value_B = c('1', '2', '2'),
  category = c("mw", "mm", "other")
)

Does anyone have any insight on how I may achieve this?

Thank you!


Solution

  • library(tidyverse)
    
    example %>%
      separate_wider_position(PID, c(a=1, b = 3), cols_remove = FALSE)%>%
      pivot_wider(names_from = a, values_from = c(PID, value), 
                  names_glue = "{.value}_{c(`1` = 'A', `2` = 'B')[a]}") %>%
      mutate(category = case_when(
               value_A == value_B ~ 'mm',
               value_A == 1 & value_B== 2 ~ 'mw',
               value_A == 2 & value_B== 1 ~ 'mw', 
               .default = 'other'), b = NULL)
    
    # A tibble: 3 × 5
      PID_A PID_B value_A value_B category
      <dbl> <dbl> <chr>   <chr>   <chr>   
    1  1091  2091 1       2       mw      
    2  1358  2358 2       2       mm      
    3  1087  2087 1,4,6   2       other