Search code examples
rggplot2boxplotdiagram

What are these dots in this boxplot?


Boxplot

I created a boxplot for a data set and did not use the geom_jitter function. Still there are dots inside the plot. Are those statistical values or why are they appearing?

I attached the code I use below.

pacman::p_load(tidyverse, readxl, janitor, emmeans, multcomp, magrittr,
               parameters, effectsize, multcompView, see, performance,
               conflicted, ggpubr, rstatix)
conflict_prefer("select", "dplyr")
conflict_prefer("filter", "dplyr")
conflict_prefer("summarise", "dplyr")
conflict_prefer("extract", "magrittr")

cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") ## Color blind friendly palette
##--------------------------------------------------------------------------------------------------------------------------
## Funktion um Excel Datei mit mehreren Sheets zu öffnen und eines davon auszuwählen
library(readxl)    
read_excel_allsheets <- function(filename) {
  sheets <- readxl::excel_sheets(filename)
  x <- lapply(sheets, function(x) readxl::read_excel(filename, sheet = x))
  return(x)
}

big_tbl <- read_excel_allsheets ("Mesocosms_R.xlsx")
big_tbl

phyto_plankton_tbl<- big_tbl[[14]]
##--------------------------------------------------------------------------------------------------------------------------
## Data transformation
phyto_plankton_tbl %>% 
  mutate(
  block = as.factor(block),
  trt = factor(trt, labels = c("-P&-F", "+P/-F", "+P/+F", "-P/+F")))

phyto_plankton_tbl <- phyto_plankton_tbl %>% 
  gather(key = "time", value = "PelaChl", t0, t1, t2, t3, t4, t5) %>%  ## Ändert Tabelle aus width format into long format
  convert_as_factor(trt, time)
print(phyto_plankton_tbl, n = 40)
##--------------------------------------------------------------------------------------------------------------------------
## Visualization

pelaChl_bxp <- ggplot(data = phyto_plankton_tbl, aes(x= time, y = PelaChl, fill = trt)) +
  geom_boxplot() + 
  ylim(0, 50) +
  scale_fill_manual(values=cbPalette) + ## Adds color blind firendly palette
  ##  geom_jitter() +
  theme_bw() 


Solution

  • From the documentation:

    The boxplot compactly displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.

    The individual points you are seeing are "outliers" (though, as Roland has helpfully pointed out,"outlier" is a loaded term- often people think they can just remove any values from a dataset which are unusual or extreme, when they may be real data points which reflect the weirdness of the underlying data).