I am trying to create a graph similar to the following graph as part of a research project:
In my CSV file, I have a column which is the continuous variable of blood pressure and I have a column which is the categorical/binary variable of survived (yes/no). Is there any way I can create this graph using ggplot in R?
Essentially, I'd like blood pressure to be on the X-axis in discrete 10mmHg intervals, plotted against the number/proportion of patients within that blood pressure discrete interval who survived.
I'm quite new to R so apologies if this is a basic question. I couldn't find the answer on the forums. Thanks in advance.
Suppose your data looks something like this:
set.seed(2)
df <- data.frame(SBP = sample(101:199, 1000, TRUE))
df$survived <- c('yes', 'no')[rbinom(1000, 1, (df$SBP - 100)/200) + 1]
head(df)
#> SBP survived
#> 1 185 no
#> 2 179 yes
#> 3 170 no
#> 4 106 yes
#> 5 132 yes
#> 6 108 yes
Then you can do:
library(tidyverse)
df %>%
mutate(BP = 10 * floor(SBP/10) + 5) %>%
summarize(survival = sum(survived == 'yes')/n(),
n = n(), .by = BP) %>%
ggplot(aes(BP, survival)) +
geom_col(width = 10, fill = NA, color = 'black') +
geom_text(aes(label = paste0(scales::percent(survival, 1),
'\n(n = ', n, ')')),
nudge_y = -0.1) +
theme_classic(base_size = 16) +
scale_x_continuous(breaks = seq(100, 200, 10)) +
scale_y_continuous(labels = scales::percent)
If this doesn't work for you, please adapt the names of the data frame and columns to suit your own data.