Search code examples
rdataframedplyr

Inconsistency using subsetting vs dplyr::filter


I have a data-frame which gives gives different results, depending on the filtering method used.. see screen shot below, which seems weird. Any thoughts on why this might be?

enter image description here


Solution

  • Could be the presence of NAs in the data?

    library(tidyverse)
    
    df <- tibble(x = factor(rep(c(1:3, NA), 5))) # 5 x 2
    
    df |> filter(x == 2)
    #> # A tibble: 5 × 1
    #>   x    
    #>   <fct>
    #> 1 2    
    #> 2 2    
    #> 3 2    
    #> 4 2    
    #> 5 2
    df[df$x == 2, ]
    #> # A tibble: 10 × 1
    #>    x    
    #>    <fct>
    #>  1 2    
    #>  2 <NA> 
    #>  3 2    
    #>  4 <NA> 
    #>  5 2    
    #>  6 <NA> 
    #>  7 2    
    #>  8 <NA> 
    #>  9 2    
    #> 10 <NA>
    

    Created on 2024-04-18 with reprex v2.1.0

    This is noted at the top of the ?dplyr::filter help page:

    Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [.