Search code examples
rggplot2stacked-chartstackedbarseries

R ggplot2 problems with stacked barplot with 3 variables mixed with mirror density chart


Dear Stackoverflow community,

Once again, I have a question concerning the ggplot2 possibilities of R. Before I start with explaining my problem, an example of a dataframe is provided here below:

age <- c(12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15)
anticoagulation <- c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1)
atc <- c(1, 0, 2, 0, 1, 2, 1, 0, 2, 0, 1, 2, 1, 0, 2, 0, 1, 2, 0, 0)
df <- data.frame(age, anticoagulation, atc)
  • anticoagulation coding: 0 = no anticoagulation, 1 = received anticoagulation
  • atc coding: 0 = nitrofurantoin, 1 = fosfomycin, 2 = trimethoprim

I want to visualise the differences in anticoagulation prescription per age group and per atc group. What I have done so far:

frame <- aggregate(df$anticoagulation, by=list(df$age), FUN=length)
frame$age <- frame$Group.1
frame$n <- frame$x
frame <- frame [,3:4]

my_table<- table(df$age, df$anticoagulation)
table <- as.data.frame.matrix(my_table)
frame$n_noanti <- table$"0"
frame$n_yesanti <- table$"1"

frame$per_yesanti <- (frame$n_yesanti/frame$n)*100 # percentage
frame$per_noanti <- (frame$n_noanti/frame$n)*100 # percentage


ggplot(frame, aes(x=x) ) +
  geom_bar( aes(x = reorder (age, -per_yesanti), y =per_yesanti), stat="identity", fill="#69b3a2" ) +
  geom_label(aes(x=15, y=100, label="Used anticoagulants"), color="#69b3a2")+
  geom_bar( aes( x =reorder (age, -per_noanti), y=-per_noanti), stat="identity", fill="#404080" ) +
  geom_label( aes(x=15, y=-100, label="No anticoagulants"), color="#404080") +
  theme(axis.text.x=element_blank()) + 
  xlab ("Age") + 
  ylab ("Percentages of how many women used anticoagulants")+
  ggtitle("Distribution of anticoagulants per age")+
  theme(plot.title = element_text(hjust = 0.5), text = element_text(size=15))

Output Output of ggplot mirror density here above

However, I would like to have such an graph but with stacked bars like this: Example of stacked bars

The stacked parts are based on the atc-coding. I have tried to only make a stacked graph, but that has failed miserably.

I have tried it with the code 'aggregate', but I am stuck with what to use and what to merge together.

frame2 <- aggregate(frame$anticoagulation, by=list(frame$age, frame$atc), FUN=length)

However, this aggregation code makes it too long to use.

What I have also tried, is using a separate aggregate code for atc vs age and add that to the 'frame'.

atc2<- table(df$age, df$atc)
t_atc2 <- as.data.frame.matrix(atc2)
frame$n_nitro <- t_atc2$"0"
frame$n_fosfo <- t_atc2$"1"
frame$n_trim <- t_atc2$"2"

But still, I cannot get the stacked function to work. My attempt to do a stacked bar with only the percentage of anticoagulation=yes (coding=1) =

    ggplot(frame, aes(fill = n_nitro+n_fosfo+n_trim, y=per_yesanti, x=age)) + 
  geom_bar(position="stack", stat="identity") +
  ggtitle("Anticoagulation per age")

graph: No distinction between the 2 atc groups

I hope someone can mix the two graphs together. If that is very impossible than only a stacked graph with the percentage of the anticoagulation=1 (per_yesanti) is good as well.

So, in short, if the mixed graph is very difficult. How can I made the following graph (so only 1 graph):

  • only details with anticoagulants = 1/ yes
  • details of anticoagulants has to be in percentage (calculated by total anticoagulants yes/no)
  • x-axis is per age
  • de bars have to be filled in by atc

Like this: enter image description here

Thanks in advance!


Solution

  • I'm still not sure what to make of your data, but I try to give an answer. It's a bit difficult to get bar plots based on percentages grouped by another variable directly in ggplot2. Therefore, the easiest solution is to calculate the percentage beforehand and then use geom_col to plot these.

    Using dplyr, you can group_by both age and the other variable you want to have the stacked separation for:

    age <- c(12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15)
    anticoagulation <- c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1)
    atc <- c(1, 0, 2, 0, 1, 2, 1, 0, 2, 0, 1, 2, 1, 0, 2, 0, 1, 2, 0, 0)
    df <- data.frame(age, anticoagulation, atc)
    
    library(dplyr)
    library(ggplot2)
    
    df_summary <- df %>% 
      group_by(age, anticoagulation) %>% 
      summarise(count = n()) %>% 
      mutate(percentage = count / sum(count) * 100)
    
    
    ggplot(df_summary, aes(x = factor(age), y = percentage, fill = factor(anticoagulation))) +
             geom_col()
    

    enter image description here

    df_summary_2 <- df %>% 
      group_by(age, atc) %>% 
      summarise(count = n()) %>% 
      mutate(percentage = count / sum(count) * 100)
    
    ggplot(df_summary_2, aes(x = factor(age), y = percentage, fill = factor(atc))) +
      geom_col()
    

    enter image description here


    Edit

    I've adapted my graph. I've couldn't come up with a solution to calculate everything in one go. Therefore I first calculate the counts per age group in total_count_info. This allows me to later calculate the percentage for every age group. Then I count the occurrences of atc per age and anticoagulation:

    total_count_info <- df %>% 
      group_by(age) %>% 
      summarise(count_age = n())
    
    df_summary_3 <- df %>% 
      group_by(age, anticoagulation, atc) %>% 
      summarise(count = n()) %>% 
      left_join(total_count_info) %>% 
      mutate(percentage = count / count_age * 100)
    
    
    ggplot(df_summary_3 %>% filter(anticoagulation == 1),
    aes(x = factor(age), y = percentage, fill = factor(atc))) +
      geom_col() +
      ylab("percentage of anticoagulation == 1")
    

    enter image description here