Search code examples
rggplot2pie-chartdonut-chartsunburst-diagram

Multiple, dependent-level sunburst/doughnut chart using ggplot2


I'm trying to create a two-level sunburst/doughnut diagram (for print) where the second level is a detailed view of the first. I've read and understood this tutorial, but I'm an R and ggplot2 newbie and am having trouble producing the second level. In the aforementioned article, the root level only has one element (which is a bit redundant), whereas my root has many elements; of which, the secondary level has at least 1 and up to 10 elements.

Let's say my data has three columns: name, type and value; where name and type define the root and second level elements, respectively. Each name has exactly one type of all, which is the summation of the values across over types (of which, there's at least one and, across names the sets of type may intersect or be mutually exclusive). For example:

name  type    value
----- ------- ------
foo   all     444
foo   type1   123
foo   type2   321
bar   all     111
bar   type3   111
baz   all     999
baz   type1   456
baz   type3   543

I can create the root level stack (before being converted to polar coordinates) using:

data.all <- data[data$type == "all",]
ggplot(data.all, aes(x=1, y=data.all$value, fill=data.all$name)) + geom_bar(stat="identity")

What I need for the second level stack is for the type values to align within the name values, proportional to their value:

 +-----+  +-------+
 |     |  | type3 |
 | baz |  +-------+
 |     |  | type1 |
 +-----+  +-------+
 |     |  |       |
 | bar |  | type3 |
 |     |  |       |
 +-----+  +-------+
 |     |  | type2 |
 | foo |  +-------+
 |     |  | type1 |
-+-----+--+-------+-

(n.b., this is obviously not to scale!)

I also need the type values to be coloured consistently (e.g., the colour of the type1 block should be the same for both foo and baz, etc.)

I thought I could do this by combining the name and type columns into a new column and then colouring by this:

data.other <- data[data$type != "other",]
data.other$comb <- paste(data.other$name, data.other$type, sep=":")
ggplot(data.other, aes(x=2, y=data.other$value, fill=data.other$comb)) + geom_bar(stat="identity")

However, this breaks the colouring consistency -- obviously, in hindsight -- and, anecdotally, I have absolutely no faith that the alignment will be correct.

My R/ggplot2 nativity is probably pretty apparent (sorry!); how can I achieve what I'm looking for?


EDIT I also came across this question and answer, however my data looks different to theirs. If my data can be munged into the same shape -- which I don't know how to do -- then my question becomes a special case of theirs.


Solution

  • This might only be partway there, and it might not scale well to a much more complex dataset. I got intensely curious about how to do this, and had a similar larger dataset I'm trying to visualize for work, so this is actually helping my out with my job too :)

    Basically what I did is split the dataset into dataframes for three levels: a parent level that's basically dummy data, a level 1 df with sums of all the types under each name (I suppose I could have just filtered your data for type == "all"--I didn't have a similar column for my work data), and a level 2 that's all the outer nodes. Bind them all together, make a stacked bar chart, give it polar coordinates.

    The one I did for work had a lot more labels, and they were pretty long, so I used ggrepel::geom_text_repel for the labels instead. They quickly became unwieldy and ugly.

    Obviously the aesthetics here leave something to be desired, but I think it could be beautified to your liking.

    library(tidyverse)
    
    df <- "name  type    value
    foo   all     444
    foo   type1   123
    foo   type2   321
    bar   all     111
    bar   type3   111
    baz   all     999
    baz   type1   456
    baz   type3   543" %>% read_table2() %>%
        filter(type != "all") %>%
        mutate(name = as.factor(name) %>% fct_reorder(value, sum)) %>%
        arrange(name, value) %>%
        mutate(type = as.factor(type) %>% fct_reorder2(name, value))
    
    lvl0 <- tibble(name = "Parent", value = 0, level = 0, fill = NA)
    
    lvl1 <- df %>%
        group_by(name) %>%
        summarise(value = sum(value)) %>%
        ungroup() %>%
        mutate(level = 1) %>%
        mutate(fill = name)
    
    lvl2 <- df %>%
        select(name = type, value, fill = name) %>%
        mutate(level = 2)
    
    
    bind_rows(lvl0, lvl1, lvl2) %>%
        mutate(name = as.factor(name) %>% fct_reorder2(fill, value)) %>%
        arrange(fill, name) %>%
        mutate(level = as.factor(level)) %>%
        ggplot(aes(x = level, y = value, fill = fill, alpha = level)) +
            geom_col(width = 1, color = "gray90", size = 0.25, position = position_stack()) +
            geom_text(aes(label = name), size = 2.5, position = position_stack(vjust = 0.5)) +
            coord_polar(theta = "y") +
            scale_alpha_manual(values = c("0" = 0, "1" = 1, "2" = 0.7), guide = F) +
            scale_x_discrete(breaks = NULL) +
            scale_y_continuous(breaks = NULL) +
            scale_fill_brewer(palette = "Dark2", na.translate = F) +
            labs(x = NULL, y = NULL) +
            theme_minimal()
    

    Created on 2018-04-24 by the reprex package (v0.2.0).