Search code examples
rdataframedplyrfilter

Extracting entries in a dataframe corresponding to n smallest positive values and n largest negative values of a certain variable in r


Imagine I have a table like the following one.

set.seed(12)
table = 
  data.frame(
    value = rnorm(n = 10), 
    par = runif(n = 10, min = - 1, max = 1)
  )

How can I extract the entries of value and par that correspond to the two smallest values of par above zero and the two biggest ones below zero? What I would like to obtain is something like

out = 
  data.frame(
    value = c(-0.2722960, -0.1064639, -0.3153487, 0.4280148),
    par = c(-0.464112814, - 0.121141350, 0.009535904, 0.339638592)
  )

I would appreciate if this could be done using dplyr so to be able to do it for bigger dataframes with grouping variables.


Solution

  • If you include par >= 0 to your grouping, you can select 2 minimum absolute values with slice_min(abs(par), n = 2):

    library(dplyr, warn.conflicts = FALSE)
    set.seed(12)
    table = 
      data.frame(
        value = rnorm(n = 10), 
        par = runif(n = 10, min = - 1, max = 1)
      )
    
    table |> 
      group_by(pos = par >= 0) |>
      slice_min(abs(par), n = 2) |>
      ungroup()
    #> # A tibble: 4 × 3
    #>    value      par pos  
    #>    <dbl>    <dbl> <lgl>
    #> 1 -0.106 -0.121   FALSE
    #> 2 -0.272 -0.464   FALSE
    #> 3 -0.315  0.00954 TRUE 
    #> 4  0.428  0.340   TRUE
    

    Created on 2024-04-25 with reprex v2.1.0