My question is tangental from this previous question: How to summarise by group AND get a summary of the overall dataset using dplyr in R
Using the reprex from the reply to set up the new question:
library(tidyverse)
set.seed(500)
dat <- tibble(
treatment = sample(c("Group1", "Group2", "Group3"), 100, replace = TRUE),
recruitment_strategy = sample(c("Strategy 1", "Strategy 2", "Strategy 3", "Strategy 4", "Strategy 5"), 100, replace = TRUE),
Variable_A = rnorm(100),
Variable_B = rnorm(100),
Variable_C = rnorm(100)
)
Now lets make it so that one treatment x strategy combination does not occur in the data
dat2 <- dat %>%
filter(!(recruitment_strategy == "Strategy 1" & treatment == "Group1"))
And run the previous solution:
dat2 %>%
inner_join(
x = count(., treatment, recruitment_strategy) %>% spread(treatment, n),
y = count(., recruitment_strategy, name = "Overall_dataset"),
by = "recruitment_strategy"
) %>%
mutate_at(
.vars = vars(-recruitment_strategy),
.funs = ~ str_glue("{.} ({scales::percent(. / sum(.), accuracy = 1)})")
)
The group which doesn't occur shows up with NA count:
# A tibble: 5 × 5
recruitment_strategy Group1 Group2 Group3 Overall_dataset
<chr> <glue> <glue> <glue> <glue>
1 Strategy 1 NA (NA) 13 (30%) 4 (16%) 17 (17%)
2 Strategy 2 8 (NA) 6 (14%) 6 (24%) 20 (20%)
3 Strategy 3 6 (NA) 12 (27%) 3 (12%) 21 (21%)
4 Strategy 4 9 (NA) 4 (9%) 5 (20%) 18 (18%)
5 Strategy 5 6 (NA) 9 (20%) 7 (28%) 22 (22%)
My question is how can I make it show 0 count instead of NA?
I have tried modifying the count()
arguments with .drop = FALSE
but it does not make a difference
x = count(., treatment, recruitment_strategy, .drop = FALSE) %>% spread(treatment, n),
Any other ideas anyone? Thx
Ok I found my own solution. The trick is to make sure recruitment_strategy
is a factor variable before counting with .drop = FALSE
also set. Also, I realised spread
is a bit outdated so I've changed it to use pivot_wider
instead also:
dat2 %>%
mutate( recruitment_strategy = as.factor( recruitment_strategy)) %>%
inner_join(
x = count(., treatment, recruitment_strategy, .drop = FALSE) %>%
pivot_wider(id_cols = recruitment_strategy,
names_from = treatment,
values_from = n),
y = count(., recruitment_strategy, name = "Overall_dataset"),
by = "recruitment_strategy"
) %>%
mutate_at(
.vars = vars(-recruitment_strategy),
.funs = ~ str_glue("{.} ({scales::percent(. / sum(.), accuracy = 1)})")
)
# A tibble: 5 × 5
recruitment_strategy Group1 Group2 Group3 Overall_dataset
<fct> <glue> <glue> <glue> <glue>
1 Strategy 1 0 (0%) 13 (30%) 4 (16%) 17 (17%)
2 Strategy 2 8 (28%) 6 (14%) 6 (24%) 20 (20%)
3 Strategy 3 6 (21%) 12 (27%) 3 (12%) 21 (21%)
4 Strategy 4 9 (31%) 4 (9%) 5 (20%) 18 (18%)
5 Strategy 5 6 (21%) 9 (20%) 7 (28%) 22 (22%)