Group by, summarize, spread in R not working

I have a data frame that looks like the following:

  ID  Code  Desc
  1   0A    Red
  1   NA    Red
  2   1A    Blue
  3   2B    Green

I want to first create a new column where I concatenate the values in the Code column where the IDs are the same. So:

  ID  Combined_Code  Desc
  1    0A | NA       Red
  2    1A            Blue
  3    2B            Green

Then I want to take the original Code column and spread it. The values in this case would be a count of how many times each Code shows up for a given ID. So:

  ID  Combined_Code 0A  NA  1A  2B  Desc
  1    0A | NA      1   1   0   0   Red
  2    1A           0   0   1   0   Blue
  3    2B           0   0   0   1   Green

I've tried:

sample_data %>%
 group_by(ID) %>%
 summarise(Combined_Code = paste(unique(Combined_Code), collapse ='|'))

This works for creating the concatenation. However, I can't get this to work in tandem with spread:

 sample_data %>%
  group_by(ID) %>%
  summarise(Combined_Code = paste(unique(Combined_Code), collapse ='|'))

sample_data <- spread(count(sample_data, ID, Combined_Code, Desc., Code), Code, n, fill = 0)

Doing this spreads, but drops the concatenation. I've also tried this with filter instead of summarise, which gives the same result. This results in:

 ID  Combined_Code 0A  NA  1A  2B  Desc
  1    0A          1   0   0   0   Red
  1    NA          0   1   0   0   Red
  2    1A          0   0   1   0   Blue
  3    2B          0   0   0   1   Green

Finally, I've tried piping spread through the summarise function:

sample_data %>%
  group_by(ID) %>%
  summarise(Combined_Code = paste(unique(Combined_Code), collapse ='|')) %>%
  spread(count(sample_data, ID, Combined_Code, Desc., Code), Code, n, fill = 0)

This results in the error:

Error: `var` must evaluate to a single number or a column name, not a list
Run `rlang::last_error()` to see where the error occurred.

What can I do to solve these problems?

Solution

We can do a group by paste

library(dplyr)
library(stringr)
df1 %>%
   group_by(ID, Desc) %>%
   summarise(Combined_Code = str_c(Code, collapse="|"))
# A tibble: 3 x 3
# Groups:   ID [3]
#     ID Desc  Combined_Code
#  <int> <chr> <chr>        
#1     1 Red   0A|0B        
#2     2 Blue  1A           
#3     3 Green 2B

For the second case, after creating a 'val' column of 1s, paste the 'Code' elements afte grouping by 'ID', 'Desc', then use pivot_wider from tidyr to reshape from 'long' to 'wide format.

library(tidyr)
df1 %>% 
   mutate(val = 1) %>%
   group_by(ID, Desc) %>% 
   mutate(Combined_Code = str_c(Code, collapse="|")) %>% 
   pivot_wider(names_from = Code, values_from = val, values_fill = list(val = 0))
# A tibble: 3 x 7
# Groups:   ID, Desc [3]
#    ID Desc  Combined_Code  `0A`  `0B`  `1A`  `2B`
#  <int> <chr> <chr>         <dbl> <dbl> <dbl> <dbl>
#1     1 Red   0A|0B             1     1     0     0
#2     2 Blue  1A                0     0     1     0
#3     3 Green 2B                0     0     0     1

The OP's expected output is

  ID  Combined_Code 0A  0B  1A  2B  Desc
  1    0A | 0B      1   1   0   0   Red
  2    1A           0   0   1   0   Blue
  3    2B           0   0   0   1   Green

Update

For the updated dataset, there are NA elements in the 'Code', and by default str_c returns NA if there any NA as one of the elements, while paste still returns the NA along with the other elements. Here, we replace the str_c with paste

df2 %>% 
    mutate(val = 1) %>%
    group_by(ID, Desc) %>% 
    mutate(Combined_Code = paste(Code, collapse="|")) %>% 
    pivot_wider(names_from = Code, values_from = val, values_fill = list(val = 0))
# A tibble: 3 x 7
# Groups:   ID, Desc [3]
#     ID Desc  Combined_Code  `0A`  `NA`  `1A`  `2B`
#  <int> <chr> <chr>         <dbl> <dbl> <dbl> <dbl>
#1     1 Red   0A|NA             1     1     0     0
#2     2 Blue  1A                0     0     1     0
#3     3 Green 2B                0     0     0     1

data

df1 <- structure(list(ID = c(1L, 1L, 2L, 3L), Code = c("0A", "0B", "1A", 
"2B"), Desc = c("Red", "Red", "Blue", "Green")), 
class = "data.frame", row.names = c(NA, 
-4L))



df2 <- structure(list(ID = c(1L, 1L, 2L, 3L), Code = c("0A", NA, "1A", 
"2B"), Desc = c("Red", "Red", "Blue", "Green")), class = "data.frame",
row.names = c(NA, 
-4L))