Search code examples
rsummary

Summarize each category of rows in one column using R


I'm wondering if this is something possible in R: I have 2 columns. Column A (primaryhistory2.DEPT) has a bunch of categorical data, column B (primaryhistry2.ACT.ENROLL) has numbers and NAs. looks like this

I want to get a summary of column B for each category in column A. Something like, for "NUT" in column A, I want to see min, max, mean, median, NAs, etc. And I would like to see this for every category. Like when you use summary() command.

Not sure if this is possible.. Thank you all in advance!

@Moody_Mudskipper The results are what I'm looking for. But without column names it's hard to read. enter image description here

and for the base R, it's not doing counts for NAs, which I do see a lot of NAs in my file. enter image description here


Solution

  • Very possible using dplyr library:

    library(dplyr)
    most.of.the.answer = df %>% 
        group_by(primaryhistory2.DEPT) %>%
        summarise(min = min(primaryhistry2.ACT.ENROLL, na.rm = TRUE), max = max(primaryhistry2.ACT.ENROLL, na.rm = TRUE), mean = mean(primaryhistry2.ACT.ENROLL, na.rm = TRUE), median = median(primaryhistry2.ACT.ENROLL, na.rm = TRUE))
    

    (assuming your dataframe is called df)

    For counting NA's, try dplyr's filter feature:

    count.NAs = df %>% filter(is.na(primaryhistry2.ACT.ENROLL)) %>%
        group_by(primaryhistory2.DEPT) %>%
        summarise(count.NA = n())
    

    I'll leave it to you to merge the two dataframes.