Search code examples
rr-factor

How can I compare two factors with different levels?


Is it possible to compare two factors of same length, but different levels? For example, if we have these 2 factor variables:

A <- factor(1:5)

str(A)
 Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5

B <- factor(c(1:3,6,6))

str(B)
 Factor w/ 4 levels "1","2","3","6": 1 2 3 4 4

If I try to compare them using, for example, the == operator:

mean(A == B)

I get the following error:

Error in Ops.factor(A, B) : level sets of factors are different


Solution

  • Convert to character then compare:

    # data
    A <- factor(1:5)
    B <- factor(c(1:3,6,6))
    
    str(A)
    # Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5
    str(B)
    # Factor w/ 4 levels "1","2","3","6": 1 2 3 4 4
    
    mean(A == B)
    

    Error in Ops.factor(A, B) : level sets of factors are different

    mean(as.character(A) == as.character(B))
    # [1] 0.6
    

    Or another approach would be

    mean(levels(A)[A] == levels(B)[B])
    

    which is 2 times slower on a 1e8 dataset.