Search code examples
rfilterdplyrconditional-statementsrelational

flitering with dplyr's filter() – using relational operators


I am stuck. How do I simultaneous us two relational operators as filters within group within group using dplyr's filter()

What I got

data,

# install.packages(c("tidyverse"), dependencies = TRUE)
library(tibble)
tbl <- structure(list(id1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
           2L, 2L, 2L, 2L, 2L, 2L), id2 = c("x_02", "x_02", "x_02", "x_02", "x_02", "x_02",
           "x_02", "x_02", "x_02", "x_02", "x_02", "x_02", "x_02", "x_03", "x_03", "x_03",
           "x_03", "x_03", "x_03", "x_03", "x_03"), x = c(-4L, -3L, -2L, -1L, 1L, 2L, 3L,
           4L, 5L, -2L, -1L, 1L, 2L, -2L, -1L, 1L, 2L, 3L, 4L, 5L, 6L)),
           class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -21L),
           vars = c("id1", "id2"), drop = TRUE, .Names = c("id1", "id2", "x"),
           indices = list(0:8, 9:12, 13:20), group_sizes = c(9L, 4L, 8L),
           biggest_group_size = 9L, labels = structure(list(id1 = c(1L, 2L, 2L),
           id2 = c("x_02", "x_02", "x_03")), class = "data.frame", row.names = c(NA, -3L),
           vars = c("id1", "id2"), drop = TRUE, .Names = c("id1", "id2")))

tbl
#> # A tibble: 21 x 3
#> # Groups:   id1, id2 [3]
#>      id1   id2     x
#>    <int> <chr> <int>
#>  1     1  x_02    -4
#>  2     1  x_02    -3
#>  3     1  x_02    -2
#>  4     1  x_02    -1
#>  5     1  x_02     1
#>  6     1  x_02     2
#>  7     1  x_02     3
#>  8     1  x_02     4
#>  9     1  x_02     5
#> 10     2  x_02    -2
#> # ... with 11 more rows

In short I want to lookup within id1, within id2, and find series of xs that start at x < -2 and end at x > 2 (the desired outcome below mig illustrate it better than I describe it here).

In a way I see it as a combination of two filters, i.e., I want this filter,

library(dplyr)
tbl %>% group_by(id1, id2) %>%
   filter( (row_number() == n() & x >  2 ) )
#> # A tibble: 2 x 3
#> # Groups:   id1, id2 [2]
#>     id1   id2     x
#>   <int> <chr> <int>
#> 1     1  x_02     5
#> 2     2  x_03     6

to be combined with this filter,

tbl %>% group_by(id1, id2) %>%
   filter( (row_number() == 1 & x < -2 )  )
#> # A tibble: 1 x 3
#> # Groups:   id1, id2 [1]
#>     id1   id2     x
#>   <int> <chr> <int>
#> 1     1  x_02    -4

something like this, I thought, but that doesn't give me any data.

tbl %>% group_by(id1, id2) %>%
   filter( (row_number() == n() & x >  2 ) &
           (row_number() == 1   & x < -2 )  )

Why doesn't this give?

What I am trying to get / desired outcome

# A tibble: 2 x 3
# Groups:   id1, id2 [1]
    id1   id2     x
  <int> <chr> <int>
1     1  x_02    -4
1     1  x_02     5

Solution

  • When both conditions are met within a group, there should be 2 rows for that group, so just filter again n() == 2

    tbl %>%
      filter((row_number() == n() & x > 2 ) | (row_number() == 1 & x < -2 )) %>%
      filter(n() == 2)
    
    # A tibble: 2 x 3
    # Groups: id1, id2 [1]
        # id1 id2       x
      # <int> <chr> <int>
    # 1     1 x_02     -4
    # 2     1 x_02      5
    

    Try the additional (see comments):

    tbl %>%
      filter((row_number() == n() & x > 2 ) | (row_number() == 1 & x < -2 )) %>%
      filter(n() == 2) %>%
      distinct(id1, id2) %>%
      left_join(., tbl, by=c("id1", "id2"))