Search code examples
rtidyr

How to use pivot_wider keeping each duplicate separate


How to pivot_wider this small dataset? I want to put the race as column headers with each Count but for an unknown reason it puts all the Counts into a list. Also getting the warning below:

library(tidyverse)
a <- structure(list(Count = c(1, 1, 3, 1, 2, 1, 2, 1, 3, 1, 1, 2, 
2, 1, 3, 3, 3, 5, 3, 3), race = c("L", "F", "W", "F", "F", "LF", 
"F", "F", "F", "F", "F", "F", "F", "S", "F", "F", "F", "F", "F", 
"F"), year = c("2012", "2013", "2013", "2013", "2013", "2013", 
"2013", "2012", "2013", "2013", "2012", "2013", "2013", "2013", 
"2013", "2013", "2013", "2013", "2013", "2013")), row.names = c(NA, 
20L), class = "data.frame")

a
 Count race year
1      1    L 2012
2      1    F 2013
3      3    W 2013
4      1    F 2013
5      2    F 2013
6      1   LF 2013
7      2    F 2013
8      1    F 2012
9      3    F 2013
10     1    F 2013
11     1    F 2012
12     2    F 2013
13     2    F 2013
14     1    S 2013
15     3    F 2013
16     3    F 2013
17     3    F 2013
18     5    F 2013
19     3    F 2013
20     3    F 2013

a %>% pivot_wider(names_from = race, values_from = Count)

# A tibble: 2 x 6
  year  L         F          W         LF        S        
  <chr> <list>    <list>     <list>    <list>    <list>   
1 2012  <dbl [1]> <dbl [2]>  <NULL>    <NULL>    <NULL>   
2 2013  <NULL>    <dbl [14]> <dbl [1]> <dbl [1]> <dbl [1]>


     Warning message:
Values from `Count` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = {summary_fun}` to summarise duplicates.
* Use the following dplyr code to identify duplicates.
  {data} %>%
  dplyr::group_by(year, race) %>%
  dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
  dplyr::filter(n > 1L)  

EDITED: Expected output

year  L         F          W         LF        S        
  2012  1         1          0          0        0   
  2012  0         1          0          0        0
  2013  0         1          3          1        0
  2013  0         1          0          0        0
  2013 ...       ...        ...        ...      ...
  2013 ...       ...        ...        ...      ...
  2013 ...       ...        ...        ...      ...
  2013  0         3          0          0        0     
  2013 ...       ...        ...        ...      ...

Solution

  • If I understand, you want something like below. Since you don't want the multiple observations for each race-year pair to be combined or aggregated in any way, you can make a new variable (I called it obs) that is the observation number within race-year pair and thus this variable will keep those values separate rather than trying to combine or aggregate them.

    library(dplyr)
    library(tidyr)
    a <- structure(list(Count = c(1, 1, 3, 1, 2, 1, 2, 1, 3, 1, 1, 2, 
    2, 1, 3, 3, 3, 5, 3, 3), race = c("L", "F", "W", "F", "F", "LF", 
    "F", "F", "F", "F", "F", "F", "F", "S", "F", "F", "F", "F", "F", 
    "F"), year = c("2012", "2013", "2013", "2013", "2013", "2013", 
    "2013", "2012", "2013", "2013", "2012", "2013", "2013", "2013", 
    "2013", "2013", "2013", "2013", "2013", "2013")), row.names = c(NA, 
    20L), class = "data.frame")
    
    a %>% 
      group_by(race, year) %>% 
      mutate(obs = row_number()) %>% 
      pivot_wider(names_from = "race", values_from="Count", values_fill=0) %>% 
      arrange(year, obs) %>% 
      select(-obs)
    #> # A tibble: 16 × 6
    #> # Groups:   year [2]
    #>    year      L     F     W    LF     S
    #>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
    #>  1 2012      1     1     0     0     0
    #>  2 2012      0     1     0     0     0
    #>  3 2013      0     1     3     1     1
    #>  4 2013      0     1     0     0     0
    #>  5 2013      0     2     0     0     0
    #>  6 2013      0     2     0     0     0
    #>  7 2013      0     3     0     0     0
    #>  8 2013      0     1     0     0     0
    #>  9 2013      0     2     0     0     0
    #> 10 2013      0     2     0     0     0
    #> 11 2013      0     3     0     0     0
    #> 12 2013      0     3     0     0     0
    #> 13 2013      0     3     0     0     0
    #> 14 2013      0     5     0     0     0
    #> 15 2013      0     3     0     0     0
    #> 16 2013      0     3     0     0     0
    

    Created on 2023-09-03 with reprex v2.0.2