Search code examples
rplotlyboxplot

plotly grouped barplot: how to specify quartiles but showoutliers


I am creating a grouped boxplot using plotly. I have to specify the quanitles because I have a specific way of calculating them. I also want to add the outliers to the plot as with standard behavior for a boxplot where plotly calculates the quantiles internally. I am currently trying to add them as a separate trace, but they end up in the middle of the grouped boxes. Maybe there is a way of adding them along with the plotly call that adds the grouped boxes, but if there is I cant't see it. How can I make it so that the outliers line up with the boxes? Reprex below.

Grouped box plot with outliers added on top

set.seed(123) # Set seed for reproducibility

# Create the site_name column with 5 different site names, each with 20 rows
site_name <- rep(paste0("site_", 1:5), each = 40)

# Create the site_type column with 10 'A's and 10 'B's for each site
site_type <- rep(c("A", "B"), each = 20, times = 5)

# Create the value column with random numbers
value <- runif(100, min = 0, max = 200) # Random numbers between 0 and 100

# Combine into a data frame
df <- data.frame(site_name, site_type, value)

# Display the first few rows of the dataset
head(df, 20)

# Group by site_name and site_type, then calculate summary statistics
stats_df <- df %>%
  group_by(site_name, site_type) %>%
  summarise(
    lower_fence = quantile(value, probs = c(0.05), type = 5, na.rm = TRUE),
    q1 = quantile(value, probs = c(0.25), type = 5, na.rm = TRUE),
    median = quantile(value, probs = c(0.5), type = 5, na.rm = TRUE),
    mean = mean(value, na.rm = TRUE),
    q3 = quantile(value, probs = c(0.75), type = 5, na.rm = TRUE),
    upper_fence = quantile(value, probs = c(0.95), type = 5, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    .groups = 'drop'
  )

# Create the grouped bar plot
fig <- plot_ly(
  data = stats_df,
  x = ~factor(site_name),
  color = ~factor(site_type),
  colors = c("blue","red"),
  type = "box",
  source = "boxes",
  lowerfence = ~lower_fence,
  q1 = ~q1,
  median = ~median,
  q3 = ~q3,
  upperfence = ~upper_fence,
  showlegend = show_legend
) %>% 
  layout(boxmode = "group")

# Extract outliers
filtered_df<- df %>%
  left_join(stats_df, by = c("site_name", "site_type")) %>%
  filter(value < lower_fence | value > upper_fence)

# Add the outlier points
fig <- fig %>%
  add_trace(
    data = filtered_df,
    x = ~factor(site_name),  
    y = ~value,  
    color = ~factor(site_type),
    colors = landuse_colors,
    type = "scatter",
    mode = "markers",
    marker = list(size = 5, opacity = 0.6),  # Customize marker appearance
    showlegend = FALSE,  # Hide legend for scatter points if desired
    inherit = FALSE
  )

# Show the figure
fig

Solution

  • You were so close! You need to call the argument scattermode = 'group' in the call for layout().

    However, because the version of Plotly used by the R library by default is so old, it won't work without updating the Plotly.Js dependency that the R library relies on.

    The points were a bit off center after updating, boxgap was used to align the markers. The value of 1/5 was used because there are 6 groups leading to 5 between-groups' space.

    You can use arguments like boxgap and boxgroupgap to adjust the appearance, but I don't believe that scattergap or scattergroupgap were added to Plotly with addition of scattermode.

    The code

    I used a UDF to update the Plotly.Js dependency. (There are multiple arguments in the Plotly library that don't work in R without this update, so this function could be useful for a variety of reasons...)

    fixer <- function(plt) {
           # changes to dependency so that all code works
      plt$dependencies[[5]]$src$file = NULL
      plt$dependencies[[5]]$src$href = "https://cdn.plot.ly"
      plt$dependencies[[5]]$script = "plotly-2.33.0.min.js"
      plt$dependencies[[5]]$local = FALSE
      plt$dependencies[[5]]$package = NULL
      plt
    }
    

    The boxplot, markers, layout and the updated dependency

    If you comment out, hide, or remove fixer(), you'll see that the call for scattermode is ignored.

    plot_ly(data = stats_df, x = ~site_name, color = ~site_type,   # boxes
             colors = c("blue","red"), type = "box",
             lowerfence = ~lower_fence, q1 = ~q1, median = ~median,
             q3 = ~q3, upperfence = ~upper_fence) %>% 
        add_markers(data = filtered_df, x = ~site_name, y = ~value, # markers
                    color = ~site_type, showlegend = F,
                    marker = list(size = 5, opacity = 0.6)) %>% 
        layout(boxmode = "group", scattermode = "group", boxgap = 1/5) %>% # align
        fixer()                                   # update Plotly dependency
    

    The boxplot with aligned outliers.

    aligned boxplot markers