I have this data from Alzheimer disease patients cohort. I would like to create a summarized table (or contingency table) to show all information in this table. This is what I would like to see in this cohort: how many males and female, average age of onset, average age at last visit, average age at death, number of samples (IID) with apoe4any. What should be my approach to create such table in R?
dat <- structure(list(IID = structure(1:10, .Names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"), .Label = c("NACC000875",
"NACC003779", "NACC006805", "NACC008215", "NACC010067", "NACC010592",
"NACC011413", "NACC015383", "NACC017476", "NACC017538"), class = "factor"),
cohort = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
), .Label = "ADC8_AA", class = "factor"), sex = structure(c(`1` = 2L,
`2` = 2L, `3` = 2L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 1L,
`8` = 2L, `9` = 2L, `10` = 2L), .Label = c("1", "2"), class = "factor"),
status = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 2L, `6` = 1L, `7` = 2L, `8` = 1L, `9` = 2L, `10` = 2L
), .Label = c("1", "2"), class = "factor"), Race = structure(c(`1` = 1L,
`2` = 1L, `3` = 1L, `4` = 1L, `5` = 1L, `6` = 1L, `7` = 1L,
`8` = 1L, `9` = 1L, `10` = 1L), .Label = "2", class = "factor"),
Ethnicity = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
), .Label = "0", class = "factor"), age_onset = structure(c(NA,
NA, NA, NA, 1L, NA, 4L, NA, 2L, 3L), .Label = c(" 63", " 67",
" 71", " 79", "888"), class = "factor"), age_last_visit = structure(c(`1` = 6L,
`2` = 4L, `3` = 3L, `4` = 2L, `5` = 1L, `6` = 1L, `7` = 8L,
`8` = 7L, `9` = 1L, `10` = 5L), .Label = c("70", "71", "74",
"77", "78", "82", "86", "89"), class = "factor"), age_death = structure(c(NA,
NA, NA, 1L, NA, NA, 3L, 2L, NA, NA), .Label = c(" 72", " 88",
" 90", "888"), class = "factor"), apoe4any = structure(c(`1` = 1L,
`2` = 2L, `3` = 1L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 2L,
`8` = 2L, `9` = 2L, `10` = 2L), .Label = c("0", "1"), class = "factor")), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
R uses factor
class for categorical data. If you change your ages (which are currently factors) to numeric
, then summary(dat)
will give you most of what you want.
convert_to_numeric = c("age_onset", "age_last_visit", "age_death")
dat[convert_to_numeric] = lapply(dat[convert_to_numeric], function(x) as.numeric(as.character(x)))
summary(dat)
# IID cohort sex status Race Ethnicity age_onset age_last_visit
# NACC000875:1 ADC8_AA:10 1:2 1:6 2:10 0:10 Min. :63 Min. :70.00
# NACC003779:1 2:8 2:4 1st Qu.:66 1st Qu.:70.25
# NACC006805:1 Median :69 Median :75.50
# NACC008215:1 Mean :70 Mean :76.70
# NACC010067:1 3rd Qu.:73 3rd Qu.:81.00
# NACC010592:1 Max. :79 Max. :89.00
# (Other) :4 NA's :6
# age_death apoe4any
# Min. :72.00 0:3
# 1st Qu.:80.00 1:7
# Median :88.00
# Mean :83.33
# 3rd Qu.:89.00
# Max. :90.00
# NA's :7
See this common FAQ for explanation of my factor to numeric conversion.
You can also subset the data if you only want to summarize the columns you mention:
summary(dat[c("sex", convert_to_numeric, "apoe4any")])
# sex age_onset age_last_visit age_death apoe4any
# 1:2 Min. :63 Min. :70.00 Min. :72.00 0:3
# 2:8 1st Qu.:66 1st Qu.:70.25 1st Qu.:80.00 1:7
# Median :69 Median :75.50 Median :88.00
# Mean :70 Mean :76.70 Mean :83.33
# 3rd Qu.:73 3rd Qu.:81.00 3rd Qu.:89.00
# Max. :79 Max. :89.00 Max. :90.00
# NA's :6 NA's :7