Search code examples
rggplot2ggiraph

Layered ggplot - avoid dataset duplication


I'm creating a layered ggplot2 plot, where I first specify global data and mapping attributes in the ggplot2::ggplot() function. When I go to add subsequent layers, as long as these layers only have aesthetics or data that were globally defined, this works fine.

However, if I need to add a new data/aesthetic mapping in this later layer, it appears to wipe out the global data-—ggplot can no longer find it. Is there a way to use BOTH new data for a specific layer and the globally-defined data?

Example:

base_plot <- ggplot2::ggplot(
    data = dplyr::select(df, c("County", "geometry")),
    mapping = ggplot2::aes(geometry = geometry)
)
# works fine
new_plot <- 
    base_plot +
    ggiraph::geom_sf_interactive(
        mapping = aes(geometry = geometry, tooltip = County)
    )
base_plot <- ggplot2::ggplot(
    data = dplyr::select(df, c("County", "geometry")),
    mapping = ggplot2::aes(geometry = geometry)
)
# fails, with error "object 'geometry' not found"
new_plot <- 
    base_plot +
    ggiraph::geom_sf_interactive(
        data = df_with_just_metric_column,
        mapping = aes(geometry = geometry, tooltip = County, fill = metric)
    )

It appears that the new layer switches over to look for all these fields only within the new data we have provided (df_with_just_metric_column), but I don't want to have to duplicate these fields to that dataframe. Is there a way we can get ggplot/ggiraph to also look in the globally-defined data that was added to the plot grabbing geometry and County from the globally-defined data, and metric from the new data)?

While I've seen some similar questions posted many years ago, I wasn't able to find an answer that:

  1. is fairly recent, reflecting the current capabilities of ggplot2.

  2. addresses this specific situation where we are trying to combine existing data already added to the plot with new data used for a new layer.


Solution

  • As long as the new data has the same number of rows and the same order as the global data you can use cbind or dplyr::bind_cols to add new columns on the fly via the data= argument like so.

    Using a minimal reproducible example based on the default example from ?geom_sf:

    library(ggiraph)
    library(ggplot2)
    library(dplyr, warn = FALSE)
    
    ## Example data
    df <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
    df <- df[c("AREA", "NAME", "CNTY_ID")]
    names(df) <- c("metric", "County", "Id", "geometry")
    
    df_with_just_metric_column <- sf::st_drop_geometry(df["metric"])
    
    
    base_plot <- ggplot(
      data = dplyr::select(df, c("County", "geometry")),
      mapping = ggplot2::aes(geometry = geometry)
    )
    
    # Option 1: Add the column to the data using cbind or dplyr::bind_cols
    new_plot <- base_plot +
      ggiraph::geom_sf_interactive(
        data = ~ cbind(., df_with_just_metric_column),
        mapping = aes(geometry = geometry, tooltip = County, fill = metric)
      )
    
    girafe(ggobj = new_plot)
    

    Personally I would prefer a join which however requires that the global data and the new data share a key column to join by. IMHO this approach is more safe as it does not depend on the order of the data and that it can be used even if the "new data" contains only data for some rows of the global data, e.g. only data for some counties.

    # Option 2: Use a join which requires a key column to join by but IMHO is more safe
    base_plot2 <- ggplot(
      data = dplyr::select(df, c("County", "Id", "geometry")),
      mapping = ggplot2::aes(geometry = geometry)
    )
    
    df_with_just_metric_column2 <- sf::st_drop_geometry(df[c("Id", "metric")])
    
    new_plot2 <- base_plot2 +
      ggiraph::geom_sf_interactive(
        data = ~ merge(., df_with_just_metric_column2, by = "Id"), # or inner_join or ...
        mapping = aes(geometry = geometry, tooltip = County, fill = metric)
      )
    
    girafe(ggobj = new_plot2)