Search code examples
rggplot2aesggproto

What is the functionality of "non_missing_aes" in ggproto of ggplot2?


I'm writing extensions for ggplot2, and found that there's a newly added non_missing_aes parameter in ggproto that has not been explained in the official documentations of ggplot2 and official guide of extending ggplot2, could anyone tell me its functionality, and the difference between required_aes? Thanks!


Solution

  • TLDR

    require_aes specifies the aesthetic mappings that must be exist before everything in a geom_*() or stat_*() is passed into a ggproto object, while non_missing_aes specifies the aesthetic mappings that must exist after the necessary processing steps by different functions defined in the said ggproto object.

    Longer explanation

    Since you are writing extensions, I assume you are familiar with how a data frame is passed into ggplot() and inherited by each relevant layer (or passed directly into each layer), then passed into the relevant Geom / Stat ggproto objects and transformed along the way.

    non_missing_aes, along with required_aes, is referenced as part of this data transformation process, in Geom$handle_na as well as Stat$compute_layer functions, from which all other Geoms & Stats inherit by default.

    More specifically, non_missing_aes is found within the remove_missing function as follows (I added the function argument names below for clarity):

    remove_missing(df = data, 
                   na.rm = params$na.rm, 
                   vars = c(self$required_aes, self$non_missing_aes), 
                   name = snake_class(self))
    

    From ?remove_missing, we can tell that this is where all columns listed in either require_aes or non_missing_aes are checked, and rows with missing values in any of the checked columns are dropped from the data frame.

    But why use non_missing_aes? Why not specify all such columns in require_aes? A look at some Geoms / Stats that actually specify something in non_missing_aes suggests why:

    GeomBar (comments below are from the actual code found on GitHub):

    required_aes = c("x", "y"),
    
    # These aes columns are created by setup_data(). They need to be listed here so
    # that GeomRect$handle_na() properly removes any bars that fall outside the defined
    # limits, not just those for which x and y are outside the limits
    non_missing_aes = c("xmin", "xmax", "ymin", "ymax"),
    ...
    

    GeomRaster:

    required_aes = c("x", "y"),
    non_missing_aes = "fill",
    default_aes = aes(fill = "grey20", alpha = NA),
    ...
    

    GeomSegment:

    required_aes = c("x", "y", "xend", "yend"),
    non_missing_aes = c("linetype", "size", "shape"),
    default_aes = aes(colour = "black", size = 0.5, linetype = 1, alpha = NA),
    ...
    

    GeomPoint:

    required_aes = c("x", "y"),
    non_missing_aes = c("size", "shape", "colour"),
    default_aes = aes(shape = 19, colour = "black", size = 1.5, fill = NA,
                      alpha = NA, stroke = 0.5),
    ...
    

    StatYdensity (note that this Stat is usually used with geom_violin, which specifies weight = 1 in its default_aes):

    required_aes = c("x", "y"),
    non_missing_aes = "weight",
    ...
    

    In each case, the aesthetic mappings listed in non_missing_aes are ones that are NOT necessarily specified by the user at the point a ggplot object is generated, so the corresponding columns may not exist in the data frame from the onset.

    For GeomBar, xmin / xmax / ymin / ymax columns are only calculated from the given data frame during GeomBar$setup_data(). For the rest, the non_missing_aes mappings are included in their respective Geoms' default_aes, so while they could exist from the onset if the user included something like colour = <some variable in the data> in geom_*(), the columns will be created at a later stage otherwise, and filled with default values.

    In either case, by the time the data frame is evaluated by the remove_missing function, all columns in either required_aes or non_missing_aes should be present, but since not all were inputted by the user from the onset, we can't specify all of them in required_aes, because any aesthetic mapping listed in required_aes but not present in the geom_*() / stat_*() would trigger an error:

    Error: geom_* requires the following missing aesthetics: some_aes_or_other