I have a data frame with 4 groups (defined by categories "a" and "b" in column 1 and categories "X" and "Y" in column 2). I want to rank the attributes in column 3 by their values in column 4, but specifically within the groups in columns 1 and 2 (AX, AY, BX, BY), and then select only the top n (e.g., n = 2) values from each group.
arrange(col1, col2, desc(col4))
works to arrange the data, but because the data are not technically grouped, functions like top_n
return just the top n values of the entire list. I thought of using slice_max
but can't install the beta version of dplyr from GitHub on my restricted network. What is the best approach?
Original data:
col1 col2 col3 col4
a X pat 1
b Y dog 2
b X leg 3
a X hog 4
b Y egg 5
a Y log 6
b X map 7
b Y ice 8
b X mat 9
a Y sat 10
arrange(col1, col2, desc(col4))
gives
col1 col2 col3 col4
a X hog 4
a X pat 1
a Y sat 10
a Y log 6
b X mat 9
b X map 7
b X leg 3
b Y ice 8
b Y egg 5
b Y dog 2
but I cannot figure out how to filter this down to just the top 2 values.
(example input code below)
col1 <- c('a','b','b','a','b','a','b','b','b','a')
col2 <- c('X','Y','X','X','Y','Y','X','Y','X','Y')
col3 <- c('pat','dog','leg','hog','egg','log','map','ice','mat','sat')
col4 <- c(1,2,3,4,5,6,7,8,9,10)
df <- data.frame(col1,col2,col3,col4)
colA <- c('a','a','a','a','b','b','b','b','b','b')
colB <- c('X','X','Y','Y','X','X','X','Y','Y','Y')
colC <- c('hog','pat','sat','log','mat','map','leg','ice','egg','dog')
colD <- c(4,1,10,6,9,7,3,8,5,2)
df1 <- data.frame(colA,colB,colC,colD)
We can use top_n
after grouping by the 'colA', 'colB'
library(dplyr)
df %>%
group_by(colA, colB) %>%
top_n(2)