Search code examples
rdplyrtidyversesubset

Get demographic information by condition using table() in R


I have a dataframe:

df <- data.frame (ID  = c(1:20),
                  Ethnicity = c(rep(c("White", "Asian", "Black", "Hispanic", "Other"), times=20/5)),
                  Age = c(1:20),
                  Set = rep(c(1,2,3,4), times=20/4)
)

I want to know the ethnicity and age breakdown by Set. I usually use table(df$ethnicity), but how do I do this by Set?

The desired output for ethnicity is a table with the percentage of each ethnicity by Set. For example, in this case, all sets will have 20% White, 20% Asian, 20% Black, 20% Hispanic, 20% Other. As for age, it will output the mean age of each set in a table.

Thank you!


Solution

  • You can use prop.table:

    prop.table(table(df$Ethnicity, df$Set), 2)
    
                 1   2   3   4
      Asian    0.2 0.2 0.2 0.2
      Black    0.2 0.2 0.2 0.2
      Hispanic 0.2 0.2 0.2 0.2
      Other    0.2 0.2 0.2 0.2
      White    0.2 0.2 0.2 0.2
    

    For numeric x categorical, you can use by:

    by(df$Age, df$Ethnicity, mean)
    
    df$Ethnicity: Asian
    [1] 9.5
    ------------------------------------------------------------- 
    df$Ethnicity: Black
    [1] 10.5
    ------------------------------------------------------------- 
    df$Ethnicity: Hispanic
    [1] 11.5
    ------------------------------------------------------------- 
    df$Ethnicity: Other
    [1] 12.5
    ------------------------------------------------------------- 
    df$Ethnicity: White
    [1] 8.5