Search code examples
rdataframesummarize

using r to count character occurrences in multiple columns of data.frame


I'm new to R and have a data.frame with 100 columns. Each column is character data and I am trying to make a summary of how many times a character shows up in each column. I would like to be able to make a summary of all the columns at once without having to type in code for each column. I've tried

occurrences <- table(unlist(my_df)) 

but this table gives me a summary of all columns combined (not a summary for each column.

When I make a summary for one column my output looks how I want but only for that one column:

BG_occurrences <- table(unlist(my_df$G))
   1   na SOME 
  17   20    1

Is there a way to code and get a summary of all data in each column all at once? I want the output to look something like this:

     1   na   SOME
BG:   17   20   1
sBG:  23   10   5
BX:   18   20   0
NG:   21   11   6

Solution

  • We can use lapply/sapply to loop over the columns and apply the table

    lapply(my_df, table)
    

    Or it can be done in a vectorized way

    table(c(col(my_df)), unlist(my_df))
    

    Or with tidyverse

    library(dplyr)
    library(tidyr)
    my_df %>%
       pivot_longer(cols = everything()) %>%
       count(name, value)