Search code examples
rdplyrdata-cleaning

Filtering data frame by condition including data after that condition


Is there an easy way to filter my data frame so that any rows after and including a row that follows some condition are filtered out? The issue here is that I want it to be robust enough to handle a case where that condition is not met, in which the whole data frame will be returned. Check out my examples below if that sounds confusing:

library(dplyr)

## Works
mtcars %>% 
  as_tibble() %>% 
  filter(between(row_number(), 1, which(mpg == 17.8)))

#> # A tibble: 11 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> 11  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4

## Doesn't work
mtcars %>% 
  as_tibble() %>% 
  filter(between(row_number(), 1, which(mpg == 30.5)))

#> Error in filter_impl(.data, quo): Evaluation error: Expecting a single value: [extent=0]..

Created on 2018-08-12 by the reprex package (v0.2.0).


Solution

  • You could include an ifelse statement to check whether the value is present in the dataframe. Also, you need to select the first row where the condition is verified to account for cases where the value is present more than once (in your example 21.0)

    library(dplyr)
    mtcars %>% 
    as_tibble() %>% 
    filter(between(row_number(), 1,ifelse(!any(mpg == 30),n(),which(mpg == 30)[1]-1)))
    ## returns the whole tibble
    
    mtcars %>% 
    as_tibble() %>% 
    filter(between(row_number(), 1,ifelse(!any(mpg == 21),n(),which(mpg == 21)[1]-1)))
    ## Returns a tibble with 0 rows
    
    mtcars %>% 
    as_tibble() %>% 
    filter(between(row_number(), 1,ifelse(!any(mpg == 21.4),n(),which(mpg == 21.4)[1]-1)))
    ## returns:
    # A tibble: 3 x 11
        mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1  21.0     6   160   110  3.90 2.620 16.46     0     1     4     4
    2  21.0     6   160   110  3.90 2.875 17.02     0     1     4     4
    3  22.8     4   108    93  3.85 2.320 18.61     1     1     4     1