Search code examples
rstatisticsformulalogical-operators

Strange behaviour of == with formula


I am a bit puzzled by the following. I have two formulas and would like to check whether they are the same. Here I expect to get FALSE returned.

fm1 <- formula(schades ~ termijn + zipcode + provincie + regionvormgemeente + energielabel + trede)
fm2 <- formula(schades ~ termijn + zipcode + provincie + regionvormgemeente + energielabel)
fm1 == fm2
#> [1] TRUE

identical(fm1, fm2)
#> [1] FALSE

What is the reason that fm1 == fm2 returns TRUE?

Created on 2021-12-17 by the reprex package (v2.0.1)


Solution

  • == is designed to compare values in atomic vectors, not formulars.

    Furthermore, see the following example from ?== :

    x1 <- 0.5 - 0.3
    x2 <- 0.3 - 0.1
    x1 == x2                   # FALSE on most machines
    isTRUE(all.equal(x1, x2))  # TRUE everywhere
    

    Applied to your example you can find:

        > fm1 <- formula(schades ~ termijn + zipcode + provincie + regionvormgemeente + energielabel + trede)
    > fm2 <- formula(schades ~ termijn + zipcode + provincie + regionvormgemeente + energielabel)
    > fm1 == fm2
    [1] TRUE
    > 
    > all.equal(fm1, fm2)
    [1] "formulas differ in contents"
    > isTRUE(all.equal(fm1,fm2))
    [1] FALSE
    

    But apparently reducing the number of predictors returns the expected result. It just illustrates that == should not be used for this type of comparison as its behaviour is not coherent:

    > fm1 <- formula(schades ~ termijn + zipcode + provincie)
    > fm2 <- formula(schades ~ termijn + zipcode)
    > fm1 == fm2
    [1] FALSE
    > isTRUE(all.equal(fm1,fm2))
    [1] FALSE