This is the expected output of dplyr::top_n
!
To select Top 2
> mtcars %>% dplyr::arrange(desc(mpg)) %>% dplyr::top_n(2, mpg)
mpg cyl disp hp drat wt qsec vs am gear carb
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
To select Top 3
> mtcars %>% dplyr::arrange(desc(mpg)) %>% dplyr::top_n(3, mpg)
mpg cyl disp hp drat wt qsec vs am gear carb
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
But why is that, when I select Top 4 ??
> mtcars %>% dplyr::arrange(desc(mpg)) %>% dplyr::top_n(4, mpg)
mpg cyl disp hp drat wt qsec vs am gear carb
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
I expected this
mpg cyl disp hp drat wt qsec vs am gear carb
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Can anybody please explain, what I am missing?
top_n
is superseded and should not be used, use slice_max
instead.
That said, slice_max(mtcars, mpg, n = 4)
will give the same result than top_n(mtcars, mpg, n = 3)
. This is because, under the hood, they use dplyr::min_rank
to calculate ranks. slice_max(mtcars, mpg, n = 4)
is equivalent to mtcars %>% filter(min_rank(desc(mpg)) <= 4)
.
min_rank
handles ties like so (see ?min_rank
):
min_rank() gives every tie the same (smallest) value so that c(10, 20, 20, 30) gets ranks c(1, 2, 2, 4). It's the way that ranks are usually computed in sports and is equivalent to rank(ties.method = "min").
In your case of n = 4
, the prompt returns 4 rows, because that's what it should return. min_rank(desc(c(33.9, 32.4, 30.4, 30.4, 27.3)))
returns 1 2 3 3 5
, hence the fifth observation is indeed <= 4
.
How to get the wanted result? You can use dense_rank
to do so, which has another way of evaluating ties by removing integer gaps between ranks.
mtcars %>% filter(dense_rank(desc(mpg)) <= 4)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2