I am trying to code R in order to obtain growth rate for COVID-19.
The equation can be found on the inserted image where i(t)
is the number of infected individuals at time t
.
I think I could code if the equation were to be simple growth rate i(t) = i0 * exp(r*t)
for example; however, the paper I am referring to implemented following method: "Time series data at each time point, a bootstrap method of extracting 1000 repeated observations was used, assuming that it follows a Poisson distribution with it^Data as the mean."
I have never coded such equations before :( I'm hoping this to be the great opportunity & initiation of being better coder & statistician.
Can somebody please help me out with this? I would really appreciate any type of hints or suggestions. I am never expecting whole code or anything near that; I would love to find a source where I could start this off well.
Here's an example using some AIDS case data (taken from Dobson and Barnett's GLM introduction).
aids_dat <- read.csv("https://raw.githubusercontent.com/bbolker/goettingen_2019/master/data/aids.csv")
## generate more useful time variable
aids_dat <- transform(aids_dat, time = year - min(year) + (quarter-1)/4,
cumcases = cumsum(cases))
nobs <- nrow(aids_dat)
(you would use whatever kind of model you wanted here)
f1 <- lm(log(cases) ~ time, data = aids_dat)
set.seed(101)
fitted <- predict(f1)
s <- replicate(1000,
rpois(nobs, lambda = exp(fitted)),
simplify = FALSE)
This takes the predicted values from the log-linear fit and adds Poisson variation. Chowell et al. do a bunch of stuff with the cumulative cases numbers that I don't understand (and generally think isn't a great idea anyway); it looks like they generate predicted values, create a cumulative case count, and immediately difference it to get back to cases? It's worth being careful about this stuff (keep in mind the difference between prevalence and incidence), but I'm not quite sure what Chowell et al. are doing.
refit <- lapply(s,
function(x) lm(log(x+0.1) ~ time, data = aids_dat))
I used log(x+0.1)
in case the Poisson sample generated zero values. (It would be better to use a Poisson GLM or something.)
boot_ensemble <- sapply(refit, predict)
You could also use *apply
or a for
loop to extract coefficients from each refitted model, compute epidemic metrics (growth rate or R0), etc. — whatever values you want the bootstrap distribution for.
png("boot.png")
plot(cases ~ time, data = aids_dat)
matlines(aids_dat$time, exp(boot_ensemble),
type = "l", lty = 1,
col = adjustcolor("black", alpha = 0.01))
matlines(aids_dat$time,
col = "red", lty = 1,
t(apply(exp(boot_ensemble), 1, range)))
dev.off()
You can get pointwise confidence intervals by applying quantile
to each row of the prediction ensemble, although see Juul et al 2021 for cautionary notes. (You can use the fda::fbplot()
function to plot similar curves to theirs.)
Final cautionary note, I suspect that using Poisson (rather than e.g. negative binomial) parametric bootstrapping will often underestimate uncertainty.
Juul, Jonas L., Kaare Græsbøll, Lasse Engbo Christiansen, and Sune Lehmann. “Fixed-Time Descriptive Statistics Underestimate Extremes of Epidemic Curve Ensembles.” Nature Physics 17, no. 1 (January 2021): 5–8. https://doi.org/10.1038/s41567-020-01121-y.