a question as a learn dplyr and its ilk.
I am calculating a tally and a relative frequency of a factor conditioned on two other variables in a df. For instance:
library(dplyr)
library(tidyr)
set.seed(3457)
pct <- function(x) {x/sum(x)}
foo <- data.frame(x = rep(seq(1:3),20),
y = rep(rep(c("a","b"),each=3),10),
z = LETTERS[floor(runif(60, 1,5))])
bar <- foo %>%
group_by(x, y, z) %>%
tally %>%
mutate(freq = (n / sum(n)) * 100)
head(bar)
I'd like the output, bar
, to include all the levels of foo$z
. I.e., there are no cases of C
here:
subset(bar, x==2 & y=="a")
How can I have bar
tally the missing levels so I get:
subset(bar, x==2 & y=="a",select = n)
to return 4, 5, 0, 1 (and select = freq
to give 40, 50, 0, 10)?
Many thanks.
Edit: Ran with the seed set!
We can use complete
from tidyr
bar1 <- bar %>%
complete(z, nesting(x, y), fill = list(n = 0, freq = 0))%>%
select_(.dots = names(bar))
filter(bar1, x==2 & y=="a")
# x y z n freq
# <int> <fctr> <fctr> <dbl> <dbl>
#1 2 a A 4 40
#2 2 a B 5 50
#3 2 a C 0 0
#4 2 a D 1 10