Search code examples
rggplot2forcats

plot using fct_relevel() dropping NA's


I have data that contains some NA values, and I am trying to make a plot as below:

library(ggplot2)
library(forcats)
library(dplyr)
library(ggpubr)

df<-data.frame(Y = rnorm(20, -6, 1),
                  X = sample(c("yes", "no", NA), 20, replace = TRUE))

dfplot<- df   %>% mutate(X=fct_relevel(X, "yes"))%>%
  ggplot(.,
         aes(x=X, y=Y, fill=X))+
  geom_boxplot(size=1, width = 0.2, show.legend = F, outlier.shape = NA,
               position=position_nudge(x=0.3))+
  geom_jitter(show.legend = T, shape=21, width=0.2, size=2)+
  geom_crossbar(data=df %>% group_by(X) %>% summarise(mean=mean(Y), .groups="keep"),
                aes(x=X, ymin=mean, ymax=mean, y=mean), width = 0.2, show.legend = F)+

  labs(x="",
       y="%")

dfplot

enter image description here

however when I try to plot only the "yes" and "no" variables, dropping the "NA" using filter(X!="NA") I cannot relevel them to the correct order with "yes" as the first column. The same happens if I use drop_na("X") or filter(!is.na(X)) instead of filter(X!="NA")

dfplot<- df %>% filter(X!="NA")  %>% mutate(X=fct_relevel(X, "yes"))%>%
  ggplot(.,
         aes(x=X, y=Y, fill=X))+
  geom_boxplot(size=1, width = 0.2, show.legend = F, outlier.shape = NA,
               position=position_nudge(x=0.3))+
  geom_jitter(show.legend = T, shape=21, width=0.2, size=2)+
  geom_crossbar(data=df %>% group_by(X) %>% summarise(mean=mean(Y), .groups="keep"),
                aes(x=X,ymin=mean, ymax=mean, y=mean), width = 0.2, show.legend = F)+

  labs(x="",
       y="%")

dfplot

enter image description here


Solution

  • I think the reason is because you have provided the same data in 'geom_crossbar' without also specifying the removal of 'NA' values.

    Try to "set.seed" at the beginning of the code block to make it fully reproducible.

    The following should produce a plot with 'yes' and 'no' at the correct level.

    library(ggplot2)
    library(forcats)
    library(dplyr)
    library(ggpubr)
    
    set.seed(123456)
    
    df <- data.frame(Y = rnorm(20, -6, 1),
                   X = sample(c("yes", "no", NA), 20, replace = TRUE))
    
    
    dfplot <- df %>% filter(!is.na(X))  %>% mutate(X=fct_relevel(X, 'yes')) %>%
        ggplot(.,
               aes(x=X, y=Y, fill=X))+
        geom_boxplot(size=1, width = 0.2, show.legend = F, outlier.shape = NA,
                     position=position_nudge(x=0.3))+
        geom_jitter(show.legend = T, shape=21, width=0.2, size=2)+
        geom_crossbar(data=df %>% filter(!is.na(X)) %>% group_by(X) %>% summarise(mean=mean(Y), .groups="keep"),
                      aes(x=X,ymin=mean, ymax=mean, y=mean), width = 0.2, show.legend = F)+
        
        labs(x="",
             y="%")
    
    dfplot