I'm having trouble with figuring out how to deal with a column that features several observations that I would like to tally. For example:
HTML/CSS;Java;JavaScript;Python;SQL
This is one of the cells for a column of a data frame and I'd like to tally the occurrences of each programming language. Is this something that should be tackled with str_detect(), with corpus(), or is there another way I'm not seeing?
My goal is to make each one of these languages (HTML, CSS, Java, JavaScript, Python, SQL, etc...) into a column name with the tally of how many times they occur in this column of the data frame.
I feel like I might've phrased this strangely so let me know if you need any clarification.
If you just want a total count of each label, you can use unnest_longer
and a grouped count
:
# using @DPH's example data
library(dplyr)
library(tidyr)
df %>%
mutate(across(PL, strsplit, ";")) %>%
unnest_longer(PL) %>%
group_by(PL) %>%
count()
# A tibble: 6 x 2
# Groups: PL [6]
PL n
<chr> <int>
1 HTML/CSS 2
2 Java 1
3 JavaScript 2
4 Python 1
5 R 3
6 SQL 2