I am trying to calculate the incidence of a disease per year and per age-category. I also want to apply direct standardization. Im using the function ageadjust.direct
(package epitools
).
age_cat persondays_individual contactfirst_cat ESPpop personyears year
1 <40 38624 3 26938 106. 2011
2 >90 367 2 691 1.01 2011
3 41-50 111208 10 14214 305. 2011
4 51-60 222777 29 13534 610. 2011
5 61-70 219567 41 11403 602. 2011
6 71-80 102593 26 77484 281. 2011
7 81-90 20056 10 3945 54.9 2011
8 <40 32673 6 26938 89.5 2012
9 >90 366 0 691 1.00 2012
10 41-50 102182 11 14214 280. 2012
11 51-60 209241 29 13534 573. 2012
12 61-70 224701 33 11403 616. 2012
13 71-80 104898 26 77484 287. 2012
14 81-90 23771 9 3945 65.1 2012
df<-structure(list(age_cat = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("<40", ">90", "41-50",
"51-60", "61-70", "71-80", "81-90"), class = "factor"), persondays_individual = c(38624,
367, 111208, 222777, 219567, 102593, 20056, 32673, 366, 102182,
209241, 224701, 104898, 23771), contactfirst_cat = c(3, 2, 10,
29, 41, 26, 10, 6, 0, 11, 29, 33, 26, 9), ESPpop = c(26938, 691,
14214, 13534, 11403, 77484, 3945, 26938, 691, 14214, 13534, 11403,
77484, 3945), personyears = c(105.819178082192, 1.00547945205479,
304.679452054795, 610.34794520548, 601.553424657534, 281.076712328767,
54.9479452054794, 89.5150684931507, 1.0027397260274, 279.950684931507,
573.26301369863, 615.619178082192, 287.391780821918, 65.1260273972603
), year = c("2011", "2011", "2011", "2011", "2011", "2011", "2011",
"2012", "2012", "2012", "2012", "2012", "2012", "2012")), row.names = c(NA,
14L), class = "data.frame")
I would like to calculate the crude incidence ($contactfirst_cat / personyears), the adjusted incidence and the 95% CI per year and age category.
Desired output
age_cat persondays_individual contactfirst_cat ESPpop personyears year crude.rate adj. rate lci uci
1 <40 38624 3 26938 106. 2011
2 >90 367 2 691 1.01 2011
3 41-50 111208 10 14214 305. 2011
4 51-60 222777 29 13534 610. 2011
5 61-70 219567 41 11403 602. 2011
6 71-80 102593 26 77484 281. 2011
7 81-90 20056 10 3945 54.9 2011
8 <40 32673 6 26938 89.5 2012
9 >90 366 0 691 1.00 2012
10 41-50 102182 11 14214 280. 2012
I've tried the following code (which i got from Calculating crude and age-standardized rates, rate differences, and rate ratios with CIs by subgroups)
df%>%
group_by(year, age_cat) %>%
summarise(age_adjust = list(ageadjust.direct(count = contactfirst_cat, #count of events
pop = personyears, #person years of DFpop
rate = NULL,
stdpop = ESPpop, #standard population (European standard population per age_cat)
conf.level = 0.95))) %>%
mutate(age_adjust = map(age_adjust, as.data.frame.list)) %>%
ungroup()
However, this code doesnt give me the crude/adj rates and CI's per subgroup (year and age_cat), but i get only 1 observation of these 4 columns.
How can i create new columns for crude.rate, adj.rate, lci and uci PER year and age_cat? Any help would be very appreciated! Many thanks in advance!
EDIT When I run the script in Quinten's answer i get the following result:
crude.rate adj.rate lci uci
1 0.06070314 0.07801597 0.06298263 0.09764104
The output Quinten shows as df_2 is exactly what i want however. Im not sure what im doing wrong.
First you need to unnest
your code instead of ungroup
like this:
library(tidyverse)
library(epitools)
df_2 <- df %>%
group_by(year, age_cat) %>%
summarise(age_adjust = list(ageadjust.direct(count = contactfirst_cat, #count of events
pop = personyears, #person years of DFpop
rate = NULL,
stdpop = ESPpop, #standard population (European standard population per age_cat)
conf.level = 0.95))) %>%
mutate(age_adjust = map(age_adjust, as.data.frame.list)) %>%
unnest(cols = c(age_adjust))
Outputd df_2
:
# A tibble: 14 × 6
# Groups: year [2]
year age_cat crude.rate adj.rate lci uci
<chr> <fct> <dbl> <dbl> <dbl> <dbl>
1 2011 <40 0.0284 0.0284 0.00585 0.0829
2 2011 >90 1.99 1.99 0.241 7.19
3 2011 41-50 0.0328 0.0328 0.0157 0.0604
4 2011 51-60 0.0475 0.0475 0.0318 0.0682
5 2011 61-70 0.0682 0.0682 0.0489 0.0925
6 2011 71-80 0.0925 0.0925 0.0604 0.136
7 2011 81-90 0.182 0.182 0.0873 0.335
8 2012 <40 0.0670 0.0670 0.0246 0.146
9 2012 >90 0 0 NaN 3.68
10 2012 41-50 0.0393 0.0393 0.0196 0.0703
11 2012 51-60 0.0506 0.0506 0.0339 0.0727
12 2012 61-70 0.0536 0.0536 0.0369 0.0753
13 2012 71-80 0.0905 0.0905 0.0591 0.133
14 2012 81-90 0.138 0.138 0.0632 0.262
After you can bind your two dataframes to get the desired dataframe you want like this:
df_desired <- cbind(df, df_2[,c(3:6)])
df_desired
Output:
age_cat persondays_individual contactfirst_cat ESPpop personyears year crude.rate adj.rate lci uci
1 <40 38624 3 26938 105.819178 2011 0.02835025 0.02835025 0.005846503 0.08285146
2 >90 367 2 691 1.005479 2011 1.98910082 1.98910082 0.240889337 7.18531607
3 41-50 111208 10 14214 304.679452 2011 0.03282138 0.03282138 0.015739127 0.06035969
4 51-60 222777 29 13534 610.347945 2011 0.04751388 0.04751388 0.031820792 0.06823786
5 61-70 219567 41 11403 601.553425 2011 0.06815687 0.06815687 0.048910548 0.09246249
6 71-80 102593 26 77484 281.076712 2011 0.09250144 0.09250144 0.060425010 0.13553604
7 81-90 20056 10 3945 54.947945 2011 0.18199043 0.18199043 0.087271484 0.33468687
8 <40 32673 6 26938 89.515068 2012 0.06702782 0.06702782 0.024598029 0.14589135
9 >90 366 0 691 1.002740 2012 0.00000000 0.00000000 NaN 3.67880055
10 41-50 102182 11 14214 279.950685 2012 0.03929263 0.03929263 0.019614742 0.07030538
11 51-60 209241 29 13534 573.263014 2012 0.05058760 0.05058760 0.033879310 0.07265223
12 61-70 224701 33 11403 615.619178 2012 0.05360457 0.05360457 0.036898918 0.07528074
13 71-80 104898 26 77484 287.391781 2012 0.09046884 0.09046884 0.059097248 0.13255781
14 81-90 23771 9 3945 65.126027 2012 0.13819360 0.13819360 0.063190912 0.26233449