I have this dataset from a survey:
Var1 by variable value
1 Strongly disagree Cluster 1 (n = 9) A 0
2 Strongly disagree Cluster 2 (n = 15) A 0
3 Somewhat disagree Cluster 1 (n = 9) A 0
4 Somewhat disagree Cluster 2 (n = 15) A 0
5 Neither agree nor disagree Cluster 1 (n = 9) A 2
6 Neither agree nor disagree Cluster 2 (n = 15) A 0
7 Somewhat agree Cluster 1 (n = 9) A 1
8 Somewhat agree Cluster 2 (n = 15) A 0
9 Strongly agree Cluster 1 (n = 9) A 6
10 Strongly agree Cluster 2 (n = 15) A 15
11 Strongly disagree Cluster 1 (n = 9) B 1
12 Strongly disagree Cluster 2 (n = 15) B 0
13 Somewhat disagree Cluster 1 (n = 9) B 0
14 Somewhat disagree Cluster 2 (n = 15) B 0
15 Neither agree nor disagree Cluster 1 (n = 9) B 1
16 Neither agree nor disagree Cluster 2 (n = 15) B 0
17 Somewhat agree Cluster 1 (n = 9) B 4
18 Somewhat agree Cluster 2 (n = 15) B 1
19 Strongly agree Cluster 1 (n = 9) B 3
20 Strongly agree Cluster 2 (n = 15) B 14
21 Strongly disagree Cluster 1 (n = 9) C 0
22 Strongly disagree Cluster 2 (n = 15) C 0
23 Somewhat disagree Cluster 1 (n = 9) C 0
24 Somewhat disagree Cluster 2 (n = 15) C 0
25 Neither agree nor disagree Cluster 1 (n = 9) C 3
26 Neither agree nor disagree Cluster 2 (n = 15) C 0
27 Somewhat agree Cluster 1 (n = 9) C 1
28 Somewhat agree Cluster 2 (n = 15) C 3
29 Strongly agree Cluster 1 (n = 9) C 5
30 Strongly agree Cluster 2 (n = 15) C 12
I originally plotted it like so using ggplot2 to display the count of responses:
( p5 <- ggplot(q5, aes(x = Var1, y = value, fill = variable)) +
geom_bar(stat = "identity", width = 0.5, position=position_dodge2(reverse = TRUE)) +
coord_flip() +
theme(plot.title = element_text(size = 16), axis.text.x = element_text(size = 16),
axis.title.x = element_text(size = 16),
axis.title.y = element_text(size = 16),
axis.text.y = element_text(size = 16),
legend.text=element_text(size=16),
legend.title=element_text(size=16),
strip.text.x = element_text(size = 16)) +
ylim(0,20) +
scale_x_discrete(limits=c("Strongly disagree", "Somewhat disagree", "Neither agree nor disagree", "Somewhat agree", "Strongly agree")) +
labs(x = "", y = "# of Responses", fill = "Question") +
facet_grid(. ~ by) )
which gave me this:
However, I want to display the data as a percentage rather than count.
Following this post, I changed the code accordingly to:
( p5 <- ggplot(q5, aes(x = Var1, group = by, fill = variable)) +
stat_count(mapping = aes(y = ..prop..)) +
coord_flip() +
theme(plot.title = element_text(size = 16), axis.text.x = element_text(size = 16),
axis.title.x = element_text(size = 16),
axis.title.y = element_text(size = 16),
axis.text.y = element_text(size = 16),
legend.text=element_text(size=16),
legend.title=element_text(size=16),
strip.text.x = element_text(size = 16)) +
scale_y_continuous(limits = c(0,1),labels = scales::percent_format(accuracy = 5L)) +
scale_x_discrete(limits=c("Strongly disagree", "Somewhat disagree", "Neither agree nor disagree", "Somewhat agree", "Strongly agree")) +
labs(x = "", y = "% of Responses", fill = "Question") +
facet_grid(. ~ by) )
However, this gives me this plot:
It seems like the plot is not recognizing my fill argument or the ..prop.. argument for y.
How can I fix this?
I have problems copying-pasting the data so I make an example like your data:
set.seed(111)
df = expand.grid(Var1=c("strong disagree","disagree","strong agree","agree","neither"),
by=1:2,variable=LETTERS[1:3])
df$value=rnbinom(nrow(df),mu=5,size=0.5)
df$value[df$Var1=="disagree" & df$by==1]=0
The error you have above is trying to do stat_count with on its own group. The easier solution i think is to count the proportions first and just plot:
library(ggplot2)
library(tidyr)
library(dplyr)
df %>% group_by(by,variable) %>%
mutate(value=replace_na(value/sum(value),0)) %>%
ggplot(aes(x=Var1,y=value,fill=variable)) +
geom_col(position="dodge") + facet_wrap(~by) +
scale_y_continuous(labels = scales::percent_format()) +
coord_flip()