Search code examples
rdplyr

Make dataframe into wide format and group multiple variables


I have a dataframe that looks like this:

example <- data.frame(
  PID = c(11191, 11192, 21191, 21193),
  NS = c(1,5,9,13),
  NR = c(2,6,10,14), 
  SE = c(3,7,11,15), 
  GR = c(4,8,12,16)
)

I want to make the dataframe into a wide format such that the last digit of PID is the grouping variable. Here is the desired solution:

example_solution <- data.frame(
  target = c(1119, 2119),
  NS_1 = c(1,9),
  NR_1 = c(2,10),
  SE_1 = c(3, 11),
  GR_1 = c(4,12),
  NS_2 = c(5,NA),
  NR_2 = c(6, NA),
  SE_2 = c(7, NA),
  GR_2 = c(8, NA),
  NS_3 = c(NA,13),
  NR_3 = c(NA, 14),
  SE_3 = c(NA, 15),
  GR_3 = c(NA, 16)
)

How may I achieve this? Thank you for your help!


Solution

  • library(dplyr)
    library(tidyr)
    example %>%
      mutate(one = PID %% 10, PID = PID %/% 10) %>%
      pivot_wider(id_cols = PID, names_from = one,
                  values_from = -c(PID, one), names_sep = "_")
    # # A tibble: 2 × 13
    #     PID  NS_1  NS_2  NS_3  NR_1  NR_2  NR_3  SE_1  SE_2  SE_3  GR_1  GR_2  GR_3
    #   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    # 1  1119     1     5    NA     2     6    NA     3     7    NA     4     8    NA
    # 2  2119     9    NA    13    10    NA    14    11    NA    15    12    NA    16
    

    You can change the column order using something from dplyr: order columns alphabetically in R, perhaps %>% select(PID, sort(names(.)).