Search code examples
rggplot2plyrggproto

ggplot extension function to plot a superimposed mean in a scatterplot


I am trying to create a custom function that extends ggplot2. The goal of the function is to superimpose a mean with horizontal and vertical standard errors. The code below does the entire thing.

library(plyr)
library(tidyverse)

summ <- ddply(mtcars,.(),summarise,
              dratSE = sqrt(var(drat))/length(drat),
              mpgSE = sqrt(var(mpg))/length(mpg),
              drat = mean(drat),
              mpg = mean(mpg))

ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
  geom_point(shape = 21, fill = 'black', color = 'white', size = 3) + 
  geom_errorbarh(data = summ, aes(xmin = drat - dratSE, xmax = drat + dratSE)) +
  geom_errorbar(data = summ, aes(ymin = mpg - mpgSE, ymax = mpg+mpgSE), width = .1) +
  geom_point(data = summ, color='red',size=4) 

Ideally, it would only take a function such as geom_scattermeans() to do this whole thing. But I am not sure how the aesthetics get transferred into subsequent geom functions from ggplot().

Also I've had difficulties in making a function that receives column names as argument and making it work with ddply(). enter image description here


Solution

  • Since my first answer is still the easier solution, I decieded to keep it. This answer should get OP closer to their goal.

    Building a ggproto object can be cumbersome depending on what you are trying to do. In your case you are combining 3 ggproto Geoms classes together with the possibility of a new Stat.

    The three Geoms are:

    • GeomErrorbar
    • GeomErrorbarh
    • GeomPoint

    To get started, sometimes you just need to inherit from one of the classes and overwrite the method, but to pool the three together you will need to do more work.

    Lets first consider how each of these Geoms draw their grid objects. Depending on the Geom it is in one of these functions draw_layer(), draw_panel(), and draw_group(). Fortunately, each of the geoms we want to use only use draw_panel() which mean a bit less work for us - we will just call these methods directly and build a new grobTree object. We will just need to be careful that all the correct parameters are making it to our new Geom's draw_panel() method.

    Before we get to writing our own draw_panel, we have to first consider setup_params() and setup_data() functions. Occasionally, these will modify the data right out the gate. These steps are usually helpful to have automatic processing here and are often used to standardize/transform the data. A good example is GeomTile and GeomRect, they are essentially the same Geoms but their setup_data() functions differ because they are parameterized differently.

    Lets assume you only want to assign an x and a y aesthetic, and leave the calculations of xmin, ymin, xmax, and ymax to the geoms/stats.

    Fortunately, GeomPoint just returns the data with no modifications, so we will need to incorporate GeomErrorbar and GeomErrorbarh's setup_data() first. To skip some steps, I am just going to make a new Stat that will take care of transforming those values for us within a compute_group() method.

    A note here, GeomErrorbar and GeomErrorbarh allow for another parameter to be included - width and height respectively, which controls how wide the flat portions of the error bars are.

    also, within these functions, each will make their own xmin, xmax, ymin, ymax - so we will need to distinguish these parameters.

    First load required information into the namespace

    library(ggplot2)
    library(grid)
    "%||%" <- ggplot2:::`%||%`
    

    Start with new Stat, I've decided to call it a PointError

    StatPointError <- ggproto(
      "StatPointError",
      Stat,
      #having `width` and `height` as named parameters here insure
      #that they will be available to the `Stat` ggproto object.
      compute_group = function(data, scales, width = NULL, height = NULL){
        data$width <- data$width %||% width %||% (resolution(data$x, FALSE)*0.9)
        data$height <- data$height %||% height %||% (resolution(data$y, FALSE)*0.9)
        
        data <- transform(
          data,
          x = mean(x),
          y = mean(y),
          # positions for flat parts of vertical error bars
          xmin = mean(x) - width /2,
          xmax = mean(x) + width / 2,
          width = NULL, 
          # y positions of vertical error bars
          ymin = mean(y) - sqrt(var(y))/length(y),
          ymax = mean(y) + sqrt(var(y))/length(y),
          #positions for flat parts of horizontal error bars
          ymin_h = mean(y) - height /2,
          ymax_h = mean(y) + height /2,
          height = NULL,
          # x positions of horizontal error bars
          xmin_h = mean(x) - sqrt(var(x))/length(x),
          xmax_h = mean(x) + sqrt(var(x))/length(x)
        )
        unique(data)
      }
    )
    

    Now for the fun part, the Geom, again I'm going for PointError as a consistent name.

    GeomPointError <- ggproto(
      "GeomPointError",
      GeomPoint,
      #include some additional defaults
      default_aes = aes(
        shape = 19,
        colour = "black",
        size = 1.5, # error bars have defaults of 0.5 - you may want to add another parameter?
        fill = NA,
        alpha = NA,
        linetype = 1,
        stroke = 0.5, # for GeomPoint
        width = 0.5,  # for GeomErrorbar
        height = 0.5, # for GeomErrorbarh
      ),
      draw_panel = function(data, panel_params, coord, width = NULL, height = NULL, na.rm = FALSE) {
        #make errorbar grobs
        data_errbar <- data
        data_errbar[["size"]] <- 0.5
        errorbar_grob <- GeomErrorbar$draw_panel(data = data_errbar,
                                                 panel_params = panel_params, coord = coord, 
                                                 width = width, flipped_aes = FALSE)
        #re-parameterize errbarh data
        data_errbarh <- transform(data, 
                                  xmin = xmin_h, xmax = xmax_h, ymin = ymin_h, ymax = ymax_h,
                                  xmin_h = NULL, xmax_h = NULL, ymin_h = NULL, ymax_h = NULL,
                                  size = 0.5)
        #make errorbarh grobs
        errorbarh_grob <- GeomErrorbarh$draw_panel(data = data_errbarh,
                                                   panel_params = panel_params, coord = coord,
                                                   height = height)
        point_grob <- GeomPoint$draw_panel(data = data, panel_params = panel_params,
                                           coord = coord, na.rm = na.rm)
        gt <- grobTree(
          errorbar_grob,
          errorbarh_grob,
          point_grob, name = 'geom_point_error')
        gt
      }
    )
    

    Last, we need a function for the user to call that will make a Layer object.

    geom_point_error <- function(mapping = NULL, data = NULL,
                                 position = "identity",
                                 ...,
                                 na.rm = FALSE,
                                 show.legend = NA,
                                 inherit.aes = TRUE) {
      layer(
        data = data,
        mapping = mapping,
        stat = StatPointError,
        geom = GeomPointError,
        position = position,
        show.legend = show.legend,
        inherit.aes = inherit.aes,
        params = list(
          na.rm = na.rm,
          ...
        )
      )
    }
    

    Now we can test if this is working properly

    ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
      geom_point(shape = 21, fill = 'black', color = 'white', size = 3) +
      geom_point_error(color = "red", width = .1, height = .3)
    

    ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
      geom_point(shape = 21, fill = 'black', color = 'white', size = 3) +
      geom_point_error(aes(color = hp>100))
    

    Created on 2021-05-18 by the reprex package (v1.0.0)

    There is obviously so much more you could do with this, from including additional default aesthetics such that you could control the color and size of the lines/points separately (may want to override GeomPointError$setup_data() to insure everything maps correctly).

    Finially, this geom is pretty naive in that it assumes the x and y data mappings are continuous. It still works with mixing continuous and discrete, but looks a bit funky

    ggplot(mpg, aes(cty, model)) +
        geom_point() +
        geom_point_error(color = 'red')