I have a data frame df. I would like to select 3 rows which have the smallest value in the column p.
df
p b
as 0.6 ab
yu 0.3 bc
hy 0.05 ak
get 0.7 ka
result
p b
as 0.6 ab
yu 0.3 bc
hy 0.05 ak
Two approaches:
df[df$p <= sort(df$p)[3],]
# p b
# as 0.60 ab
# yu 0.30 bc
# hy 0.05 ak
One problem with this is that when there are ties (for third) in p
, you will get more than 3 rows. Also, this will not work well when there are fewer than 3 rows.
Another approach, if you don't care about the order:
head(df[order(df$p),], n = 3)
which has the advantage that it will always give the minimum of 3 or the actual number of rows. One problem with this is that it will not tell you that there is a tie, it'll just cap the number of rows.
(One could mitigate the re-ordering by adding a column with the pre-arranged order, then re-arrange on that column post head
.)
Over to you which flow makes more sense.
Edit: an option that preserves order:
df[ rank(df$p) < 4,]
(inspired by @NotThatKindODr's suggested use of the ordered row_number() %in% 1:3
)