I'm wondering if this is something possible in R:
I have 2 columns. Column A (primaryhistory2.DEPT)
has a bunch of categorical data, column B (primaryhistry2.ACT.ENROLL)
has numbers and NAs
.
I want to get a summary of column B for each category in column A.
Something like, for "NUT" in column A, I want to see min
, max
, mean
, median
, NAs
, etc. And I would like to see this for every category. Like when you use summary()
command.
Not sure if this is possible.. Thank you all in advance!
@Moody_Mudskipper
The results are what I'm looking for. But without column names it's hard to read.
and for the base R, it's not doing counts for NAs, which I do see a lot of NAs in my file.
Very possible using dplyr
library:
library(dplyr)
most.of.the.answer = df %>%
group_by(primaryhistory2.DEPT) %>%
summarise(min = min(primaryhistry2.ACT.ENROLL, na.rm = TRUE), max = max(primaryhistry2.ACT.ENROLL, na.rm = TRUE), mean = mean(primaryhistry2.ACT.ENROLL, na.rm = TRUE), median = median(primaryhistry2.ACT.ENROLL, na.rm = TRUE))
(assuming your dataframe is called df
)
For counting NA's, try dplyr
's filter
feature:
count.NAs = df %>% filter(is.na(primaryhistry2.ACT.ENROLL)) %>%
group_by(primaryhistory2.DEPT) %>%
summarise(count.NA = n())
I'll leave it to you to merge the two dataframes.