I have a data-frame which gives gives different results, depending on the filtering method used.. see screen shot below, which seems weird. Any thoughts on why this might be?
Could be the presence of NA
s in the data?
library(tidyverse)
df <- tibble(x = factor(rep(c(1:3, NA), 5))) # 5 x 2
df |> filter(x == 2)
#> # A tibble: 5 × 1
#> x
#> <fct>
#> 1 2
#> 2 2
#> 3 2
#> 4 2
#> 5 2
df[df$x == 2, ]
#> # A tibble: 10 × 1
#> x
#> <fct>
#> 1 2
#> 2 <NA>
#> 3 2
#> 4 <NA>
#> 5 2
#> 6 <NA>
#> 7 2
#> 8 <NA>
#> 9 2
#> 10 <NA>
Created on 2024-04-18 with reprex v2.1.0
This is noted at the top of the ?dplyr::filter
help page:
Note that when a condition evaluates to
NA
the row will be dropped, unlike base subsetting with[
.