I have the following dataset:
a b
1 a
1 a
1 a
1 none
2 none
2 none
2 b
3 a
3 c
3 c
3 d
4 a
I want to get the most frequent value in b for any a and the second most frequent value of b for any a. in case two values in b have the same frequency I m indifferent about any of the two being considered the "first" or the "second".
in this case the expected output would be:
d2:
a first second
1 a none
2 none b
3 c a(or d, doesn't matter)
4 a NA
as you can see a=4 has just one value in b, thus I expect a NA in the output column "second" as there is no second most frequent value.
data:
a <- c(1,1,1,1,2,2,2,3,3,3,3,4)
b<- c("a","a", "a", "none", "none", "none", "b", "a", "c" , "c", "d","a")
d <- data.frame(a,b)
what I tried at the moment is the following
d1 <- d %>% group_by(a) %>% summarize ( first =names(which.max(table(b))) , second= names(which.max(table(b)[-which.max(table(b))] )))
but it doesn't work properly, any idea on how to do this?
You can count
number of rows for a
and b
combination and for each value of a
select 1st and 2nd value in summarise
.
library(dplyr)
d %>%
count(a, b, sort = TRUE) %>%
group_by(a) %>%
summarise(first = b[1],second = b[2])
# A tibble: 4 x 3
# a first second
# <dbl> <chr> <chr>
#1 1 a none
#2 2 none b
#3 3 c a
#4 4 a NA