I have data that contains some NA values, and I am trying to make a plot as below:
library(ggplot2)
library(forcats)
library(dplyr)
library(ggpubr)
df<-data.frame(Y = rnorm(20, -6, 1),
X = sample(c("yes", "no", NA), 20, replace = TRUE))
dfplot<- df %>% mutate(X=fct_relevel(X, "yes"))%>%
ggplot(.,
aes(x=X, y=Y, fill=X))+
geom_boxplot(size=1, width = 0.2, show.legend = F, outlier.shape = NA,
position=position_nudge(x=0.3))+
geom_jitter(show.legend = T, shape=21, width=0.2, size=2)+
geom_crossbar(data=df %>% group_by(X) %>% summarise(mean=mean(Y), .groups="keep"),
aes(x=X, ymin=mean, ymax=mean, y=mean), width = 0.2, show.legend = F)+
labs(x="",
y="%")
dfplot
however when I try to plot only the "yes" and "no" variables, dropping the "NA" using filter(X!="NA") I cannot relevel them to the correct order with "yes" as the first column. The same happens if I use drop_na("X")
or filter(!is.na(X))
instead of filter(X!="NA")
dfplot<- df %>% filter(X!="NA") %>% mutate(X=fct_relevel(X, "yes"))%>%
ggplot(.,
aes(x=X, y=Y, fill=X))+
geom_boxplot(size=1, width = 0.2, show.legend = F, outlier.shape = NA,
position=position_nudge(x=0.3))+
geom_jitter(show.legend = T, shape=21, width=0.2, size=2)+
geom_crossbar(data=df %>% group_by(X) %>% summarise(mean=mean(Y), .groups="keep"),
aes(x=X,ymin=mean, ymax=mean, y=mean), width = 0.2, show.legend = F)+
labs(x="",
y="%")
dfplot
I think the reason is because you have provided the same data in 'geom_crossbar' without also specifying the removal of 'NA' values.
Try to "set.seed" at the beginning of the code block to make it fully reproducible.
The following should produce a plot with 'yes' and 'no' at the correct level.
library(ggplot2)
library(forcats)
library(dplyr)
library(ggpubr)
set.seed(123456)
df <- data.frame(Y = rnorm(20, -6, 1),
X = sample(c("yes", "no", NA), 20, replace = TRUE))
dfplot <- df %>% filter(!is.na(X)) %>% mutate(X=fct_relevel(X, 'yes')) %>%
ggplot(.,
aes(x=X, y=Y, fill=X))+
geom_boxplot(size=1, width = 0.2, show.legend = F, outlier.shape = NA,
position=position_nudge(x=0.3))+
geom_jitter(show.legend = T, shape=21, width=0.2, size=2)+
geom_crossbar(data=df %>% filter(!is.na(X)) %>% group_by(X) %>% summarise(mean=mean(Y), .groups="keep"),
aes(x=X,ymin=mean, ymax=mean, y=mean), width = 0.2, show.legend = F)+
labs(x="",
y="%")
dfplot