Search code examples
rggplot2visualizationpie-chart

How to swap the geom_points of a ggplot lollipop plot for lil pie charts to show proportion of data points assessed for the plot


I've made a plot I am mostly happy with, using Tidyverse in R, but the plot needs to show a bit more information and I haven't managed to work out how to do that yet.

The point of the plot is to show how a bunch of cells from three different animals were algorithmically sorted and bunched together according to their biology. Each animal had a lot of different celltypes, and there were a lot outputted clusters of cells; I am plotting one outputted cluster, and after looking at all the cells from each animal that were sorted into this cluster, I chose to show the top 5 celltype names from the source animals that made it into the plot. The plot shows this nicely (to me, at least), but it doesn't show whether ALL the cells of a given source celltype were bundled into this new cluster, or if half, or almost none, etc..

Here is the code I used, and the plot I got (and mostly like!).

library(tidyverse)
# create the contents of the toy dataset, then add together
species_organ <- c(rep("frog", 5),
                   rep("bat", 5),
                   rep("bird", 5)
)
annotation <- c("celltype1", "celltype2", "celltype3", "celltype4", "celltype5",
                "celltypeA", "celltypeB", "celltypeC", "celltypeD", "celltypeE",
                "celltypeAlpha", "celltypeBeta", "celltypeGamma", "celltypeDelta", "celltypeEpsilon"
)
count_in_integratedcluster <- c(253, 245, 226, 187, 185, 42, 18, 17, 11, 9, 58, 16, 8, 8, 7)
annotation_count_in_source_dataset <- c(413, 312, 349, 410, 233, 195, 198, 56, 166, 238, 82, 68, 270, 226, 81)
fraction_of_total_celltype_abundance <- count_in_integratedcluster / annotation_count_in_source_dataset

fake_dataframe <- data.frame(species_organ, annotation, count_in_integratedcluster, annotation_count_in_source_dataset, fraction_of_total_celltype_abundance)

# a few other things to decorate the plot with
how_many_cells_in_this_integrated_cluster <- 5056
cluster_name = "cluster6"

# now we make a lollipop plot
plot_lollipop_faceted.top5 <- ggplot(fake_dataframe) +
  geom_segment( aes(x=annotation, xend=annotation, y=0, yend=count_in_integratedcluster), color="grey") +
  geom_point( aes(x=annotation, y=count_in_integratedcluster, color=species_organ), size=3 ) +
  coord_flip()+
  theme(
    legend.position = "none",
    panel.border = element_blank(),
    panel.spacing = unit(0.1, "lines"),
    strip.text.x = element_text(size = 8)
  ) +
  xlab("") +
  ylab("How many times cells of this original annotation (y-axis)\nshowed up in this integrated cluster (plot title)") +
  facet_wrap(~species_organ, ncol=1, scale="free_y") +
  labs(title = paste(paste("integrated", cluster_name, sep = " "), ",", how_many_cells_in_this_integrated_cluster, "total cells"), 
       subtitle = "In this integrated cluster, see what cells contribute per species")

(plot I mostly like but which needs improvement)

An "easy" graphical fix would be to replace the geom_point with a cute little pie chart, with colour filling to report whether 90% of "bird muscle cells" or just 10% of "bird muscle cells" were ultimately apportioned to this cluster by the algorithm.

Here is a pencil sketch of how the graph could look like, if I made the swap I am looking for.

pencil sketch of improved plot

Any solution has to be in R, and I would appreciate Tidyverse-based approaches but I'm willing to try other approaches that convey the desired set of information.

I've looked at other related questions and unfortunately couldn't manage to make the suggested methods work for me, or else the suggested solution doesn't seem to be useful in my scenario; so far, I've examined:

R::ggplot2::geom_points: how to swap points with pie charts? (scatterpie docs didn't help me make sense of what to do to implement suggestions) ggplot use small pie charts as points with geom_point (the pie is nice, but I don't want to lose the other information currently conveyed by my plot already) Plotting pie charts in ggplot2 (title sounds right, but content was not helpful) create floating pie charts with ggplot (this is the second time I saw coord_polar() but I did not figure out how to use it after fiddling a bit with it/reading its docs)


Solution

  • We can use scatterpie to get the plot you want, but it's a bit of a pain to use. It doesn't seem to like categorical variables, so these need to be converted to numeric via factor and relabelled in scales. It also won't play nicely with coord_flip, so you will need to transform the axis to get the pies circular.

    So the first step is to reshape your data:

    library(tidyverse)
    library(scatterpie)
    
    fake_dataframe <- fake_dataframe %>%
      rename(pos = fraction_of_total_celltype_abundance) %>%
      mutate(neg = 1 - pos) %>%
      mutate(annotation = fct_reorder(as.factor(annotation),
                                      as.factor(species_organ),
                                      ~mean(as.numeric(.x)))) %>%
      mutate(annotation2 = as.numeric(annotation)) %>%
      mutate(count_in_integratedcluster = count_in_integratedcluster/15)
    

    Then the plotting code is:

    ggplot(fake_dataframe,
           aes(x = annotation2, y = count_in_integratedcluster)) +
      geom_segment(aes(xend = annotation2, yend = 0), color = "grey") +
      geom_scatterpie(cols = c("pos", "neg"),
                      data = fake_dataframe,
                      aes(x = annotation2, y = count_in_integratedcluster)) +
      scale_fill_manual(values = c(pos = "black", neg = "white")) +
      coord_flip() +
      theme(
        legend.position = "none",
        panel.border = element_blank(),
        panel.spacing = unit(0.1, "lines"),
        strip.text.x = element_text(size = 8)
      ) +
      facet_grid(species_organ~., scale = "free_y", space = "free_y") +
      labs(title = paste(paste("integrated", cluster_name), ",", 
                         how_many_cells_in_this_integrated_cluster, "total cells"), 
           subtitle = paste0("In this integrated cluster, ",
                             "see what cells contribute per species"),
           y = "How many times cells of this original annotation (y-axis)
          showed up in this integrated cluster (plot title)",
           x = NULL) +
      scale_y_continuous(labels = ~.x * 15) +
      scale_x_continuous(labels = ~ levels(fake_dataframe$annotation)[.x])
    

    enter image description here