Search code examples
rdatatablecountpercentagekableextra

Create a table with counts and percentages with missing data


I am trying to create a table with preset dimensions and have R fill in the counts and percentages. This is for an R-markdown report.

Here is the code for my sample data.

#This is the most realistic data I could produce.
Maj <- rep("Major A", times=50)
set.seed(24601) 
Race <- sample(c("Asian","Black", "Am Indian","Hawiian" ,"Hispanic","White","Two or More Races","Not Reported"),
                 prob=c(.01,.1,.01,.01,.02,.80,.05,.01),size=50, replace = T)
Sex <- sample(c("Female","Male"), prob=c(.98,.02),size=50,replace=T)

Enroll_MajorA <- cbind(Maj,Sex,Race)

I need the table to calculate a count and a percent whether or not a given Race and Sex combination exists in the data set. Here is what I need it to look like.

Table Format

I have tried computing each value for the table individually and R-markdown gave me a "memory error". I have tried creating a count and percent table and combining them together but it does not give all of the Race/Sex combinations I need for the report. I am not sure where to go from here. Please Help!


Solution

  • You may use aggregate. You can keep the matrix untouched, since you can use as.data.frame, which coerces automatically into countable factors. The NROW (capitals) doesn't discriminate between matrices and vectors.

    m.agg <- do.call(data.frame, 
                     aggregate(. ~ Sex + Race, as.data.frame(Enroll_MajorA), function(x) 
                       c(count=as.integer(NROW(x)), share=NROW(x) / NROW(Enroll_MajorA))))
    

    To get the complete set, we may merge with an expand.grid, which we may want to clean up a little.

    res <- merge(as.data.frame(m.agg), expand.grid(Sex=c("Female", "Male"), 
                                                   Race=relevant.races), all=TRUE)  # `relevant.races` below
    res[, 3:4][is.na(res[, 3:4])] <- 0  # transform `NA` into 0 to be nice
    res[order(res[, "Race"]), ]  # order output
    #       Sex              Race Maj.count Maj.share
    # 1  Female             Black         2      0.04
    # 10   Male             Black         0      0.00
    # 2  Female           Hawiian         1      0.02
    # 3  Female          Hispanic         1      0.02
    # 11   Male          Hispanic         0      0.00
    # 4  Female Two or More Races         2      0.04
    # 12   Male Two or More Races         0      0.00
    # 5  Female             White        44      0.88
    # 13   Male             White         0      0.00
    # 6  Female             Asian         0      0.00
    # 14   Male             Asian         0      0.00
    # 7  Female        Am. Indian         0      0.00
    # 15   Male        Am. Indian         0      0.00
    # 8  Female          Hawaiian         0      0.00
    # 16   Male          Hawaiian         0      0.00
    # 9  Female      Not Reported         0      0.00
    # 17   Male      Not Reported         0      0.00
    

    Data

    relevant.races <- c("Asian","Black", "Am. Indian", "Hawaiian" , "Hispanic", "White", 
                        "Two or More Races", "Not Reported")
    
    Enroll_MajorA <- structure(c("Major A", "Major A", "Major A", "Major A", "Major A", 
    "Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
    "Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
    "Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
    "Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
    "Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
    "Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
    "Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
    "Major A", "Major A", "Major A", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "Female", "Female", 
    "Female", "Female", "Female", "Female", "Female", "White", "White", 
    "White", "Hawiian", "White", "White", "White", "White", "White", 
    "White", "White", "White", "White", "Two or More Races", "White", 
    "White", "White", "White", "White", "White", "White", "Hispanic", 
    "White", "White", "White", "White", "White", "White", "Two or More Races", 
    "White", "White", "White", "White", "White", "White", "White", 
    "White", "Black", "White", "White", "Black", "White", "White", 
    "White", "White", "White", "White", "White", "White", "White"
    ), .Dim = c(50L, 3L), .Dimnames = list(NULL, c("Maj", "Sex", 
    "Race")))