I have the following data in R:
gender <- c("Male","Female")
gender <- sample(gender, 5000, replace=TRUE, prob=c(0.45, 0.55))
gender <- as.factor(gender)
disease <- c("Yes","No")
disease <- sample(disease, 5000, replace=TRUE, prob=c(0.4, 0.6))
disease <- as.factor(disease)
status <- c("Immigrant","Citizen")
status <- sample(status, 5000, replace=TRUE, prob=c(0.3, 0.7))
status <- as.factor(status )
my_data = data.frame(gender, status, disease)
I want to make a table that shows:
I tried to do this with the following code:
t1 <- xtabs(disease ~ gender + status, data=my_data)
But I get this error:
Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
‘sum’ not meaningful for factors
Can someone please show me what I am doing wrong and how to fix this?
Thank you!
As there are more columns and all of them are factor
s, use count
from dplyr
and then get the proportions
library(dplyr)
library(tidyr)
my_data %>%
dplyr::count(across(everything())) %>%
pivot_wider(names_from = disease, values_from =n, values_fill = 0) %>%
group_by(gender) %>%
mutate(100 *across(No:Yes, proportions)) %>%
ungroup
-output
# A tibble: 4 × 4
gender status No Yes
<fct> <fct> <dbl> <dbl>
1 Female Citizen 69.4 72.4
2 Female Immigrant 30.6 27.6
3 Male Citizen 70.4 68.7
4 Male Immigrant 29.6 31.3
With xtabs
, if we convert the column to integer
, it could work as
apply(xtabs(n ~ disease + gender + status,
transform(my_data, n = as.integer(disease))), c(1, 2), proportions) * 100
, , gender = Female
disease
status No Yes
Citizen 69.36724 72.41993
Immigrant 30.63276 27.58007
, , gender = Male
disease
status No Yes
Citizen 70.40185 68.68687
Immigrant 29.59815 31.31313