Search code examples
rdataframesubsetalphabetical

R subsetting with colname and logical operator


can anybody explain me why subsetting in R in the third example does not give the same result as in the first and second ones? I thought the first 3 subsetting cases were equal. I can subset mtcars["cyl">=6] but why I cannot subset mtcars["cyl"==6]?

Thanks a lot!

data("mtcars")

mtcars[mtcars[,2]==6,]
#>                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
mtcars[mtcars$cyl==6,]
#>                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
mtcars["cyl"==6,]
#>  [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
#> <0 rows> (or 0-length row.names)

# but 
mtcars["cyl">=6,]
#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
....

Created on 2020-07-31 by the reprex package (v0.3.0)


Solution

  • In the third case, you are getting zero rows because you are comparing the string "cyl" (not the variable cyl) to a number, 6. Since you are comparing a character string to a number, R coerces the two types to be equal (if possible): in this case, R transforms the number 6 to a string "6".

    The condition you are now trying to satisfy is "cyl" == "6", which is always false; therefore you get zero rows.


    Regarding the fourth case, we can see that

    > "cyl" >= 6
    [1] TRUE
    

    This is true because, since the two objects are not of the same type ("cyl" is character and 6 is numeric), R coerces the number to a character "6".

    Now, the condition "cyl" >= "6" is true because, alphanumerically speaking, the numbers such as "6" come before the letters.