r dataframe dplyr group-by data-manipulation

R: Trimming dataframe based on most common column entry

I have the following dataset which I'm trying to trim (sample of the dataframe is below)

Index	Grade
Ace_1_1	A
Ace_1_1	A
Ace_1_1	B
Ace_1_1	C
Ace_1_2	A
Ace_1_2	C
Ace_1_2	C
Ace_1_3	B
Ace_1_3	B
Ace_2_2	C
Ace_2_2	A
Ace_2_2	B
Ace_2_5	C
Ace_2_5	C
Ace_2_5	A
Ace_2_5	A
Ace_3_1	D

df2 = structure(list(Index = c("Ace_1_1", "Ace_1_1", "Ace_1_1", "Ace_1_1", 
                               "Ace_1_2", "Ace_1_2", "Ace_1_2", "Ace_1_3", "Ace_1_3", "Ace_2_2", 
                               "Ace_2_2", "Ace_2_2", "Ace_2_5", "Ace_2_5", "Ace_2_5", "Ace_2_5", 
                               "Ace_3_1"), Grade = c("A", "A", "B", "C", "A", "C", "C", "B", 
                                "B", "C", "A", "B", "C", "C", "A", "A", "D")), class = "data.frame", row.names = c(NA,-17L))

I'm trying to trim it down such that for each index, only the most common grade would show. If there is a tie in frequency, the tied grades would be shown. If there's only one entry the entry would be show as well. The ideal output for the table above would be

Index	Grade
Ace_1_1	A
Ace_1_2	C
Ace_1_3	B
Ace_2_2	C
Ace_2_2	A
Ace_2_2	B
Ace_2_5	C
Ace_2_5	A
Ace_3_1	D

I'm using the group_by() function by counting the frequency with mutate (Frequency = n()) but I'm having trouble subsetting the most frequent count. Appreciate if anyone could help me count here either using dplyr or even base R, thanks!

Solution

Using data.table or dplyr, you can simply take the rows in each group where the count equals the max count:

library(dplyr)
count(df2,Index,Grade) %>% filter(n==max(n), .by=Index) %>% select(-n)

library(data.table)
setDT(df2)[, .N, .(Index, Grade)][, .SD[N==max(N), Grade], Index]

Output:

     Index  Grade
    <char> <char>
1: Ace_1_1      A
2: Ace_1_2      C
3: Ace_1_3      B
4: Ace_2_2      A
5: Ace_2_2      B
6: Ace_2_2      C
7: Ace_2_5      A
8: Ace_2_5      C
9: Ace_3_1      D