Search code examples
rcontingency

How to create a summarized demographic table in R


I have this data from Alzheimer disease patients cohort. I would like to create a summarized table (or contingency table) to show all information in this table. This is what I would like to see in this cohort: how many males and female, average age of onset, average age at last visit, average age at death, number of samples (IID) with apoe4any. What should be my approach to create such table in R?

dat <- structure(list(IID = structure(1:10, .Names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"), .Label = c("NACC000875", 
"NACC003779", "NACC006805", "NACC008215", "NACC010067", "NACC010592", 
"NACC011413", "NACC015383", "NACC017476", "NACC017538"), class = "factor"), 
    cohort = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L, 
    `5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
    ), .Label = "ADC8_AA", class = "factor"), sex = structure(c(`1` = 2L, 
    `2` = 2L, `3` = 2L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 1L, 
    `8` = 2L, `9` = 2L, `10` = 2L), .Label = c("1", "2"), class = "factor"), 
    status = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L, 
    `5` = 2L, `6` = 1L, `7` = 2L, `8` = 1L, `9` = 2L, `10` = 2L
    ), .Label = c("1", "2"), class = "factor"), Race = structure(c(`1` = 1L, 
    `2` = 1L, `3` = 1L, `4` = 1L, `5` = 1L, `6` = 1L, `7` = 1L, 
    `8` = 1L, `9` = 1L, `10` = 1L), .Label = "2", class = "factor"), 
    Ethnicity = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L, 
    `5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
    ), .Label = "0", class = "factor"), age_onset = structure(c(NA, 
    NA, NA, NA, 1L, NA, 4L, NA, 2L, 3L), .Label = c(" 63", " 67", 
    " 71", " 79", "888"), class = "factor"), age_last_visit = structure(c(`1` = 6L, 
    `2` = 4L, `3` = 3L, `4` = 2L, `5` = 1L, `6` = 1L, `7` = 8L, 
    `8` = 7L, `9` = 1L, `10` = 5L), .Label = c("70", "71", "74", 
    "77", "78", "82", "86", "89"), class = "factor"), age_death = structure(c(NA, 
    NA, NA, 1L, NA, NA, 3L, 2L, NA, NA), .Label = c(" 72", " 88", 
    " 90", "888"), class = "factor"), apoe4any = structure(c(`1` = 1L, 
    `2` = 2L, `3` = 1L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 2L, 
    `8` = 2L, `9` = 2L, `10` = 2L), .Label = c("0", "1"), class = "factor")), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")

Solution

  • R uses factor class for categorical data. If you change your ages (which are currently factors) to numeric, then summary(dat) will give you most of what you want.

    convert_to_numeric = c("age_onset", "age_last_visit", "age_death")
    dat[convert_to_numeric] = lapply(dat[convert_to_numeric], function(x) as.numeric(as.character(x)))
    summary(dat)
     #         IID        cohort   sex   status Race   Ethnicity   age_onset  age_last_visit 
     # NACC000875:1   ADC8_AA:10   1:2   1:6    2:10   0:10      Min.   :63   Min.   :70.00  
     # NACC003779:1                2:8   2:4                     1st Qu.:66   1st Qu.:70.25  
     # NACC006805:1                                              Median :69   Median :75.50  
     # NACC008215:1                                              Mean   :70   Mean   :76.70  
     # NACC010067:1                                              3rd Qu.:73   3rd Qu.:81.00  
     # NACC010592:1                                              Max.   :79   Max.   :89.00  
     # (Other)   :4                                              NA's   :6                   
     #   age_death     apoe4any
     # Min.   :72.00   0:3     
     # 1st Qu.:80.00   1:7     
     # Median :88.00           
     # Mean   :83.33           
     # 3rd Qu.:89.00           
     # Max.   :90.00           
     # NA's   :7            
    

    See this common FAQ for explanation of my factor to numeric conversion.

    You can also subset the data if you only want to summarize the columns you mention:

    summary(dat[c("sex", convert_to_numeric, "apoe4any")])
     # sex     age_onset  age_last_visit    age_death     apoe4any
     # 1:2   Min.   :63   Min.   :70.00   Min.   :72.00   0:3     
     # 2:8   1st Qu.:66   1st Qu.:70.25   1st Qu.:80.00   1:7     
     #       Median :69   Median :75.50   Median :88.00           
     #       Mean   :70   Mean   :76.70   Mean   :83.33           
     #       3rd Qu.:73   3rd Qu.:81.00   3rd Qu.:89.00           
     #       Max.   :79   Max.   :89.00   Max.   :90.00           
     #       NA's   :6                    NA's   :7