Hello I have a tibble through a pipe from tidytext::unnest_tokens()
and count(category, word, name = "count")
. It looks like this example.
owl <- tibble(category = c(0, 1, 2, -1, 0, 1, 2),
word = c(rep("hello", 3), rep("world", 4)),
count = sample(1:100, 7))
and I would like to get this tibble with an additional column that gives the number of categories the word appears in, i.e. the same number for each time the word appears.
I tried the following code that works in principal. The result is what I want.
owl %>% mutate(sum_t = sapply(1:nrow(.), function(x) {filter(., word == .$word[[x]]) %>% nrow()}))
However, seeing that my data has 10s of thousands of rows this takes a rather long time. Is there a more efficient way to achieve this?
We could use add_count
:
library(dplyr)
owl %>%
add_count(word)
output:
category word count n
<dbl> <chr> <int> <int>
1 0 hello 98 3
2 1 hello 30 3
3 2 hello 37 3
4 -1 world 22 4
5 0 world 80 4
6 1 world 18 4
7 2 world 19 4