I'm writing extensions for ggplot2
, and found that there's a newly added non_missing_aes
parameter in ggproto
that has not been explained in the official documentations of ggplot2
and official guide of extending ggplot2
, could anyone tell me its functionality, and the difference between required_aes
? Thanks!
require_aes
specifies the aesthetic mappings that must be exist before everything in a geom_*()
or stat_*()
is passed into a ggproto object, while non_missing_aes
specifies the aesthetic mappings that must exist after the necessary processing steps by different functions defined in the said ggproto object.
Since you are writing extensions, I assume you are familiar with how a data frame is passed into ggplot()
and inherited by each relevant layer (or passed directly into each layer), then passed into the relevant Geom / Stat ggproto objects and transformed along the way.
non_missing_aes
, along with required_aes
, is referenced as part of this data transformation process, in Geom$handle_na
as well as Stat$compute_layer
functions, from which all other Geoms & Stats inherit by default.
More specifically, non_missing_aes
is found within the remove_missing
function as follows (I added the function argument names below for clarity):
remove_missing(df = data,
na.rm = params$na.rm,
vars = c(self$required_aes, self$non_missing_aes),
name = snake_class(self))
From ?remove_missing
, we can tell that this is where all columns listed in either require_aes
or non_missing_aes
are checked, and rows with missing values in any of the checked columns are dropped from the data frame.
But why use non_missing_aes
? Why not specify all such columns in require_aes
? A look at some Geoms / Stats that actually specify something in non_missing_aes
suggests why:
GeomBar (comments below are from the actual code found on GitHub):
required_aes = c("x", "y"),
# These aes columns are created by setup_data(). They need to be listed here so
# that GeomRect$handle_na() properly removes any bars that fall outside the defined
# limits, not just those for which x and y are outside the limits
non_missing_aes = c("xmin", "xmax", "ymin", "ymax"),
...
required_aes = c("x", "y"),
non_missing_aes = "fill",
default_aes = aes(fill = "grey20", alpha = NA),
...
required_aes = c("x", "y", "xend", "yend"),
non_missing_aes = c("linetype", "size", "shape"),
default_aes = aes(colour = "black", size = 0.5, linetype = 1, alpha = NA),
...
required_aes = c("x", "y"),
non_missing_aes = c("size", "shape", "colour"),
default_aes = aes(shape = 19, colour = "black", size = 1.5, fill = NA,
alpha = NA, stroke = 0.5),
...
StatYdensity (note that this Stat is usually used with geom_violin
, which specifies weight = 1
in its default_aes
):
required_aes = c("x", "y"),
non_missing_aes = "weight",
...
In each case, the aesthetic mappings listed in non_missing_aes
are ones that are NOT necessarily specified by the user at the point a ggplot object is generated, so the corresponding columns may not exist in the data frame from the onset.
For GeomBar, xmin / xmax / ymin / ymax columns are only calculated from the given data frame during GeomBar$setup_data()
. For the rest, the non_missing_aes
mappings are included in their respective Geoms' default_aes
, so while they could exist from the onset if the user included something like colour = <some variable in the data>
in geom_*()
, the columns will be created at a later stage otherwise, and filled with default values.
In either case, by the time the data frame is evaluated by the remove_missing
function, all columns in either required_aes
or non_missing_aes
should be present, but since not all were inputted by the user from the onset, we can't specify all of them in required_aes
, because any aesthetic mapping listed in required_aes
but not present in the geom_*()
/ stat_*()
would trigger an error:
Error: geom_* requires the following missing aesthetics: some_aes_or_other