Search code examples
rdata-manipulation

Proportion Tables in R


I have the following data in R:

gender <- c("Male","Female")

gender <- sample(gender, 5000, replace=TRUE, prob=c(0.45, 0.55))

gender <- as.factor(gender)

disease <- c("Yes","No")

disease <- sample(disease, 5000, replace=TRUE, prob=c(0.4, 0.6))

disease <- as.factor(disease)

status <- c("Immigrant","Citizen")

status <- sample(status, 5000, replace=TRUE, prob=c(0.3, 0.7))

status  <- as.factor(status )

my_data = data.frame(gender, status, disease)

I want to make a table that shows:

  • What percent of male immigrants have the disease?
  • What percent of male non-immigrants have the disease?
  • What percent of female immigrants have the disease?
  • What percent of female non-immigrants have the disease?

I tried to do this with the following code:

 t1 <- xtabs(disease ~ gender + status, data=my_data)

But I get this error:
Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  : 
  ‘sum’ not meaningful for factors

Can someone please show me what I am doing wrong and how to fix this?

Thank you!


Solution

  • As there are more columns and all of them are factors, use count from dplyr and then get the proportions

    library(dplyr)
    library(tidyr)
    my_data %>% 
       dplyr::count(across(everything())) %>% 
       pivot_wider(names_from = disease, values_from =n, values_fill = 0) %>% 
       group_by(gender) %>% 
       mutate(100 *across(No:Yes, proportions)) %>% 
       ungroup
    

    -output

    # A tibble: 4 × 4
      gender status       No   Yes
      <fct>  <fct>     <dbl> <dbl>
    1 Female Citizen    69.4  72.4
    2 Female Immigrant  30.6  27.6
    3 Male   Citizen    70.4  68.7
    4 Male   Immigrant  29.6  31.3
    

    With xtabs, if we convert the column to integer, it could work as

    apply(xtabs(n ~ disease + gender + status, 
      transform(my_data, n = as.integer(disease))), c(1, 2), proportions) * 100
    , , gender = Female
    
               disease
    status            No      Yes
      Citizen   69.36724 72.41993
      Immigrant 30.63276 27.58007
    
    , , gender = Male
    
               disease
    status            No      Yes
      Citizen   70.40185 68.68687
      Immigrant 29.59815 31.31313