Search code examples
rtidyverse

In R, I want to pivot_wider within group_by without merging to one row


I have grouped data that I want to change to wide format – but using pivot_wider merges all the rows in to lists.

Please see example data, code and wanted output below.

# Example data
name <- c("pre", "post", "pre", "post")
id <- c(1, 1, 2, 2)
gender <- as.factor(c("M", NA, "F", NA))
text <- c("one two", "three four", "A B", "C D")
df <- tibble(name, id, gender, text)

# Works to pivot one group
df %>% 
  group_by(id) %>% 
  filter(id == 1) %>% 
  pivot_wider(names_from = name,
              values_from = c(id, gender, text),
              values_fn = list(. =  ~ toString(unique(.))))

# Does not pivot within each group.  
df %>% 
  group_by(id) %>% 
  #  filter(id == 1) %>% 
  pivot_wider(names_from = name,
              values_from = c(id, gender, text),
              values_fn = list(. =  ~ toString(unique(.))))

# This is how I would like the data to be
id_pre  <- c(1, 2)
id_post  <- c(1, 2)
gender_pre  <- as.factor(c("M", "F"))
gender_post  <- c(NA, NA)
text_pre  <- c("one two", "A B")
text_post <- c("three four", "C D")
tibble(id_pre,id_post, gender_pre, gender_post,
       text_pre,text_post)

Solution

  • pivot_wider doesn't work with group_by - instead you should use a column grouper as an id_col in pivot_wider.

    This case is a little strange because you both want to use the id column as a grouper column and a value column. I think it works best if you duplicate the column to use one copy in each way. You can, of course, drop it at the end:

    df |>
      mutate(id_orig = id) |>
      pivot_wider(
        names_from = name,
        values_from = c(id, gender, text),
        values_fn = list(. =  ~ toString(unique(.))))
      )
    # # A tibble: 2 × 7
    #   id_orig id_pre id_post gender_pre gender_post text_pre text_post 
    #     <dbl>  <dbl>   <dbl> <fct>      <fct>       <chr>    <chr>     
    # 1       1      1       1 M          NA          one two  three four
    # 2       2      2       2 F          NA          A B      C D 
    

    (Use, e.g., ... |> select(-id_orig) to drop the column. I just leave it in to illustrate.)

    Since id doesn't change pre- to post- another option would be to eliminate it from the pivot values (just using it as an id_col) and copy/rename it after:

     df |>
      pivot_wider(
        names_from = name,
        values_from = c(gender, text),
        values_fn = list(. =  ~ toString(unique(.x)))
      ) |>
      select(
        id_pre = id,
        id_post = id,
        everything()
      )
    # # A tibble: 2 × 6
    #   id_pre id_post gender_pre gender_post text_pre text_post 
    #    <dbl>   <dbl> <fct>      <fct>       <chr>    <chr>     
    # 1      1       1 M          NA          one two  three four
    # 2      2       2 F          NA          A B      C D