Search code examples
rdplyrtidyrlong-format-datawide-format-data

Use pivot_wider create new value columns based on id/group column


Probably this is can be done fairly easily, but my own tries did not give the desired output.

I have a simple data frame with some descriptive statistics.

set.seed(0712)
tibble(group = rep(1:3, each = 2),
       scale = rep(LETTERS[1:3], 2),
           n = c(100, 100, 150, 150, 135, 135),
        mean = runif(6, 1, 5),
          sd = runif(6))

    A tibble: 6 x 5
  group scale     n  mean    sd
  <int> <chr> <dbl> <dbl> <dbl>
1     1 A       100  2.42 0.387
2     1 B       100  2.27 0.343
3     2 C       150  4.61 0.709
4     2 A       150  4.51 0.207
5     3 B       135  4.93 0.653
6     3 C       135  3.05 0.253

The first column indicates the group, and each group has three values (n, mean, sd) for two different scales so that the number of groups covers all possible scale combinations (e.g. AB, AC, BC).

I would simply like to rearrange my data frame into the following overview in a wide format where the values of each group are placed within two columns per value instead of two rows.

   group scale_1 scale_2  n_1  n_2  mean_1  mean_2  sd_1   sd_2
1     1        A       B  100  100  2.42    2.27    0.387  0.343
2     2        C       A  150  150  4.61    4.51    0.709  0.207
3     3        B       C  135  135  4.93    3.05    0.653  0.253

However, using pivot_wider the situation gets messy as every value x scale combination receives its own column, but not every group has values for each scale. I understand that from a tidy data perspective that is useful, but for a simple overview I would rather have the columns lumped together into value_1 and value_2, etc.

> df %>% tidyr::pivot_wider(id_cols = group, names_from = scale, values_from = n:sd)
# A tibble: 3 x 10
  group   n_A   n_B   n_C mean_A mean_B mean_C   sd_A   sd_B   sd_C
  <int> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1     1   100   100    NA   2.42   2.27  NA     0.387  0.343 NA    
2     2   150    NA   150   4.51  NA      4.61  0.207 NA      0.709
3     3    NA   135   135  NA      4.93   3.05 NA      0.653  0.253

Any ideas on how to collapse this data frame?


Solution

  • To get the 1s and 2s in your names rather than A, B and C you could add a new temporary value by row_number() before pivoting:

    library(tidyverse)
    
    set.seed(0712)
    df <- tibble(group = rep(1:3, each = 2),
           scale = rep(LETTERS[1:3], 2),
           n = c(100, 100, 150, 150, 135, 135),
           mean = runif(6, 1, 5),
           sd = runif(6))
    
    df |> 
      group_by(group) |> 
      mutate(scaleno = row_number()) |> 
      pivot_wider(names_from = scaleno, values_from = scale:sd)
    #> # A tibble: 3 × 9
    #> # Groups:   group [3]
    #>   group scale_1 scale_2   n_1   n_2 mean_1 mean_2  sd_1  sd_2
    #>   <int> <chr>   <chr>   <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>
    #> 1     1 A       B         100   100   2.42   2.27 0.387 0.343
    #> 2     2 C       A         150   150   4.61   4.51 0.709 0.207
    #> 3     3 B       C         135   135   4.93   3.05 0.653 0.253
    

    This correctly lines up the value 1 and value 2 of each comparison by assigning a universal key, then uses these keys in naming the new columns.