Search code examples
rintegerdoublestoragenotation

A problem on "identical()" function in R? How does "identical()" work for different types of objects?


(reproducible example added)

I cannot grasp enough why the following is FALSE (I aware they are double and integer resp.):

identical(1, as.integer(1)) # FALSE

?identical reveals:

num.eq: logical indicating if (double and complex non-NA) numbers should be compared using == (‘equal’), or by bitwise comparison. The latter (non-default) differentiates between -0 and +0.

sprintf("%.8190f", as.integer(1)) and sprintf("%.8190f", 1) return exactly equal bit pattern. So, I think that at least one of the following must return TRUE. But, I get FALSE in each of the following:

identical(1, as.integer(1), num.eq=TRUE) # FALSE
identical(1, as.integer(1), num.eq=FALSE) # FALSE

I consider like that now: If sprintf is a notation indicator, not the storage indicator, then this means identical() compares based on storage. i.e. identical(bitpattern1, bitpattern1bitpattern2) returns FALSE. I could not find any other logical explanation to above FALSE/FALSE situation.

I do know that in both 32bit/64bit architecture of R, integers are stored as 32bit.


Solution

  • They are not identical precisely because they have different types. If you look at the documentation for identical you'll find the example identical(1, as.integer(1)) with the comment ## FALSE, stored as different types. That's one clue. The R language definition reminds us that:

    Single numbers, such as 4.2, and strings, such as "four point two" are still vectors, of length 1; there are no more basic types (emphasis mine).

    So, basically everything is a vector with a type (that's also why [1] shows up every time R returns something). You can check this by explicitly creating a vector with length 1 by using vector, and then comparing it to 0:

    x <- vector("double", 1)
    identical(x, 0)
    # [1] TRUE
    

    That is to say, both vector("double", 1) and 0 output vectors of type "double" and length == 1.

    typeof and storage.mode point to the same thing, so you're kind of right when you say "this means identical() compares based on storage". I don't think this necessarily means that "bit patterns" are being compared, although I suppose it's possible. See what happens when you change the storage mode using storage.mode:

    ## Assign integer to x. This is really a vector length == 1.
    x <- 1L
    
    typeof(x)
    # [1] "integer"
    
    identical(x, 1L)
    # [1] TRUE
    
    ## Now change the storage mode and compare again. 
    storage.mode(x) <- "double"
    
    typeof(x)
    # [1] "double"
    
    identical(x, 1L) # This is no longer TRUE.
    # [1] FALSE
    
    identical(x, 1.0) # But this is.
    # [1] TRUE
    

    One last note: The documentation for identical states that num.eq is a…

    logical indicating if (double and complex non-NA) numbers should be compared using == (‘equal’), or by bitwise comparison.

    So, changing num.eq doesn't affect any comparison involving integers. Try the following:

    # Comparing integers with integers.
    identical(+0L, -0L, num.eq = T) # TRUE
    identical(+0L, -0L, num.eq = F) # TRUE
    
    # Comparing integers with doubles.
    identical(+0, -0L, num.eq = T) # FALSE
    identical(+0, -0L, num.eq = F) # FALSE
    
    # Comparing doubles with doubles.
    identical(+0.0, -0.0, num.eq = T) # TRUE
    identical(+0.0, -0.0, num.eq = F) # FALSE