Search code examples
rdataframedplyrsummarize

Summarise proportions of character values across columns in table


In this kind of data frame:

df <- data.frame(
     w1 = c("A","A","B","C","A"),
     w2 = c("C","A","A","C","C"),
     w3 = c("C","A","B","C","B")
   ) 

I need to calculate across all columns the within-column proportions of the character values. Interestingly, the following code works with the large actual data set but throws an error with the above toy data:

df %>%
  summarise(across(everything(), ~prop.table(table(.))*100))

What I'm looking for is a data frame with exact proportions of all values in each column plus a column indicating the values:

       w1  w2  w3
1  A   60  40  20
2  B   20   0  40
3  C   20  60  40

Solution

  • Here's a workaround using tidyverse packages:

    library(dplyr)
    library(tidyr)
    
    pivot_longer(df, everything()) |> 
        count(value, name) |>
        mutate(n = n / sum(n) * 100, .by = name) |>
        pivot_wider(names_from = name, values_from = n, values_fill = 0)