Search code examples
rcountunique

Creating a count table of unique values when there are multiple values in a single cell using R


I am trying to create a count table from a data table that looks like this:

df <- data.frame("Spring" = c("skirt, pants, shirt", "tshirt"), "Summer" = 
c("shorts, skirt", "pants, shoes"), Fall = c("Scarf", "purse, pants"))

               Spring        Summer         Fall
1 skirt, pants, shirt shorts, skirt        Scarf
2              tshirt  pants, shoes purse, pants

and then a count table that looks like this in the end:

output <- data.frame("Spring" = 4, "Summer" = 4, Fall = 3)

  Spring Summer Fall
1      4      4    3

So, I would just like it to count the unique values in a column for each season. I am having trouble with this because of the commas separating values within 1 cell. I tried using length(unique())), but it is not giving me the correct number because of the columns.

Any help is appreciated!!!


Solution

  • One tidyverse possibility could be:

    df %>%
     mutate_if(is.factor, as.character) %>%
     gather(var, val) %>%
     mutate(val = strsplit(val, ", ")) %>%
     unnest() %>%
     group_by(var) %>%
     summarise(val = n_distinct(val))
    
      var      val
      <chr>  <int>
    1 Fall       3
    2 Spring     4
    3 Summer     4
    

    If you want to match the desired output exactly, then you can add spread():

    df %>%
     mutate_if(is.factor, as.character) %>%
     gather(var, val) %>%
     mutate(val = strsplit(val, ", ")) %>%
     unnest() %>%
     group_by(var) %>%
     summarise(val = n_distinct(val)) %>%
     spread(var, val)
    
       Fall Spring Summer
      <int>  <int>  <int>
    1     3      4      4
    

    Or using the basic idea from @Sonny (this requires just dplyr):

    df %>%
     mutate_if(is.factor, as.character) %>%
     summarise_all(list(~ n_distinct(unlist(strsplit(., ", ")))))
    
      Spring Summer Fall
    1      4      4    3