I have tibble where col1
is a list of character vectors of variable length and col2
is a numeric vector indicating a group assignment, either 1 or 0. I want to first convert all of the character vectors in the list (col1
) to factors, and then unify all of the factors levels across these factors so that I can ultimately get a tally of counts for each factor level. For the example data below, that would mean the tally would be as follows:
overall:
level, count
"a", 2
"b", 2
"c", 2
"d", 3
"e", 1
for group=1:
level, count
"a", 1
"b", 2
"c", 1
"d", 1
"e", 0
for group=0:
level, count
"a", 1
"b", 0
"c", 1
"d", 2
"e", 1
The ultimate goal is to be able to get a total count of each factor level c("a","b","c","d","e")
and plot them by the grouping variable.
Here is some code that might give better context to my problem:
library(forcats)
library(purrr)
library(dplyr)
library(ggplot2)
tib <- tibble(col1=list(c("a","b"),
c("b","c","d"),
c("a","d","e"),
c("c","d")),
col2=c(1,1,0,0))
tib %>%
mutate(col3=map(.$col1,.f = as_factor)) %>%
mutate(col4=map(.$col3,.f = fct_unify))
Unfortunately, this code fails. I get the following error, but don't know why:
Error:
fsmust be a list
I thought my input was a list?
I appreciate any help anyone might offer. Thanks.
You can first unnest
and then count
library(dplyr)
library(tidyr)
tib %>%
unnest(col = col1) %>%
#If needed col1 as factors
#mutate(col1 =factor(col1)) %>%
count(col1)
# col1 n
# <fct> <int>
#1 a 2
#2 b 2
#3 c 2
#4 d 3
#5 e 1
To count
based on group i.e col2
, we can do
tib %>%
unnest(col = col1) %>%
mutate_at(vars(col1, col2), factor) %>%
count(col1, col2, .drop = FALSE)
# col1 col2 n
# <fct> <fct> <int>
# 1 a 0 1
# 2 a 1 1
# 3 b 0 0
# 4 b 1 2
# 5 c 0 1
# 6 c 1 1
# 7 d 0 2
# 8 d 1 1
# 9 e 0 1
#10 e 1 0