Search code examples
rsum

sum for each category in factor variables


I would like to get a table that counts the frequencies of each factor category across all variables except id. I have a number of variables that each can contain 1, 0 or NA, and I want a table that gives me the frequency of each of those. Below I have changed NA to 2 because NA is not included as a category in factor variables and then I make each of the variables except id factors. I can't figure out the final step to get the table in 'want' below that simply counts for each variable how many 1s, how many 0s and how many NAs.

# have
df3=data.frame(id=c(1,2,3,4,5,6,7),
               ibd=c(0,1,1,0,0,1,1),
               ihd=c(1,1,0,NA,0,0,1),
               psoriasis=c(0,0,NA,0,0,1,1),
               mi=c(1,0,0,0,0,NA,1)
)

df3 %>% replace(is.na(.), 2)

df3 <- df3 %>%
  mutate(across(-c(id), as.factor))

# want
df5=data.frame(id=c("1s", "0s", "NAs"),
               ibd=c(4,3,0),
               ihd=c(3,3,1),
               psoriasis=c(2,4,1),
               mi=c(2,4,1)
)

Solution

  • library(tidyr); library(dplyr)
    
    pivot_longer(df3, -id) |>
      group_by(name) |>
      count(value) |>
      pivot_wider(names_from=name, values_from=n, values_fill=0)
    
    # A tibble: 3 × 5
      value   ibd   ihd    mi psoriasis
      <fct> <int> <int> <int>     <int>
    1 0         3     3     4         4
    2 1         4     3     2         2
    3 NA        0     1     1         1