Search code examples
rggplot2histogrampercentage

How to make a percentage plot histogram in R / ggplot


I am trying to create a graph similar to the following graph as part of a research project:

enter image description here

In my CSV file, I have a column which is the continuous variable of blood pressure and I have a column which is the categorical/binary variable of survived (yes/no). Is there any way I can create this graph using ggplot in R?

Essentially, I'd like blood pressure to be on the X-axis in discrete 10mmHg intervals, plotted against the number/proportion of patients within that blood pressure discrete interval who survived.

I'm quite new to R so apologies if this is a basic question. I couldn't find the answer on the forums. Thanks in advance.


Solution

  • Suppose your data looks something like this:

    set.seed(2)
    df <- data.frame(SBP = sample(101:199, 1000, TRUE))
    df$survived <- c('yes', 'no')[rbinom(1000, 1, (df$SBP - 100)/200) + 1]
    
    head(df)
    #>   SBP survived
    #> 1 185       no
    #> 2 179      yes
    #> 3 170       no
    #> 4 106      yes
    #> 5 132      yes
    #> 6 108      yes
    

    Then you can do:

    library(tidyverse)
    
    df %>%
      mutate(BP = 10 * floor(SBP/10) + 5) %>%
      summarize(survival = sum(survived == 'yes')/n(), 
                n = n(), .by = BP) %>%
      ggplot(aes(BP, survival)) +
      geom_col(width = 10, fill = NA, color = 'black') +
      geom_text(aes(label = paste0(scales::percent(survival, 1),
                                   '\n(n = ', n, ')')),
                nudge_y = -0.1) +
      theme_classic(base_size = 16) +
      scale_x_continuous(breaks = seq(100, 200, 10)) +
      scale_y_continuous(labels = scales::percent)
    

    enter image description here

    If this doesn't work for you, please adapt the names of the data frame and columns to suit your own data.