There's a book called "Event History Analysis with R" by G. Broström. But it provides minimal R code for whatever concepts are presented. I'd like to plot a hazard function, how do I do that? I want the exact same plots as given here. The plots shown are in this chapter. If possible, I'd like to do the same plots using ggplot2.
I've done survival and cumulative hazard plots using ggplot2 that look better, using the code:
library(survminer)
fit <- survfit(Surv(enter, exit, event) ~ male, data = df_final)
ggsurvplot(fit, title = "Survival Function") +
xlab("Age in years") +
ylab("Survival")
My data looks like this:
id | male | enter | exit | event |
---|---|---|---|---|
001 | 1 | 0.00 | 13.34 | 1 |
001 | 1 | 13.34 | 45.00 | 0 |
002 | 1 | 0.00 | 27.00 | 0 |
003 | 0 | 0.00 | 23.60 | 1 |
003 | 0 | 23.60 | 27.00 | 1 |
003 | 0 | 27.00 | 67.00 | 0 |
You can get the R codes to reproduce the figures in the book from the book's GitHub repo.
For example, for the figures you are trying to replicate you could copy and paste the basic data wrangling code from the repo and convert the base R plotting code to ggplot2
like so where I also made use of a bit of dplyr
, tidyr
and patchwork
to combine the subplots:
library(eha)
library(tidyverse)
library(patchwork)
females <- swepop[
swepop$sex == "women" & swepop$year %in% 2001:2020,
c("age", "pop", "year")
]
females$deaths <- swedeaths[swedeaths$sex == "women" &
swedeaths$year %in% 2001:2020, "deaths"]
females <- aggregate(females[c("pop", "deaths")], by = females["age"], FUN = sum)
females$mort <- females$deaths / females$pop
males <- swepop[
swepop$sex == "men" & swepop$year %in% 2001:2020,
c("age", "pop")
]
males$deaths <- swedeaths[swedeaths$sex == "men" &
swedeaths$year %in% 2001:2020, "deaths"]
males <- aggregate(males[c("pop", "deaths")], by = males["age"], FUN = sum)
males$mort <- males$deaths / males$pop
dat <- list(
males = males,
females = females
) |>
dplyr::bind_rows(.id = "sex")
p1 <- ggplot(dat, aes(age, mort, linetype = sex)) +
geom_hline(yintercept = 0) +
geom_line() +
scale_linetype_discrete(
labels = c(males = "Men", females = "Women"),
breaks = c("males", "females")
) +
labs(x = "Age", y = "Hazards", linetype = NULL) +
theme_bw() +
theme(
legend.position = c(0, 1),
legend.justification = c(0, 1),
legend.box.background = element_rect(color = "black")
)
p2 <- dat |>
select(sex, age, mort) |>
pivot_wider(names_from = sex, values_from = mort) |>
mutate(hazard = males / females) |>
ggplot(aes(age, hazard)) +
geom_hline(yintercept = 1) +
geom_line() +
labs(x = "Age", y = "Hazard ratio", linetype = NULL) +
theme_bw() +
theme(
legend.position = c(0, 1),
legend.justification = c(0, 1),
legend.box.background = element_rect(color = "black")
)
p1 + p2 +
plot_annotation(
title = "Female and male mortality by age, Sweden 2001-2020.",
subtitle = "The right hand panel shows the ratio between the male and female mortality by age."
)