Search code examples
rdataframeuniquetally

Tally()ing Multiple Observations In an Entire Data Frame


I'm having trouble with figuring out how to deal with a column that features several observations that I would like to tally. For example:

HTML/CSS;Java;JavaScript;Python;SQL

This is one of the cells for a column of a data frame and I'd like to tally the occurrences of each programming language. Is this something that should be tackled with str_detect(), with corpus(), or is there another way I'm not seeing?

My goal is to make each one of these languages (HTML, CSS, Java, JavaScript, Python, SQL, etc...) into a column name with the tally of how many times they occur in this column of the data frame.

I feel like I might've phrased this strangely so let me know if you need any clarification.


Solution

  • If you just want a total count of each label, you can use unnest_longer and a grouped count:

    # using @DPH's example data
    library(dplyr)
    library(tidyr)
    
    df %>%
      mutate(across(PL, strsplit, ";")) %>%
      unnest_longer(PL) %>%
      group_by(PL) %>%
      count()
    
    # A tibble: 6 x 2
    # Groups:   PL [6]
      PL             n
      <chr>      <int>
    1 HTML/CSS       2
    2 Java           1
    3 JavaScript     2
    4 Python         1
    5 R              3
    6 SQL            2