I am generating a big list of factors with different levels, and I want to be able to detect when two of them define the same partition. For example, I want to detect all of the following as equivalent to each other:
x1 <- factor(c("a", "a", "b", "b", "c", "c", "a", "a"))
x2 <- factor(c("c", "c", "b", "b", "a", "a", "c", "c"))
x3 <- factor(c("x", "x", "y", "y", "z", "z", "x", "x"))
x4 <- factor(c("a", "a", "b", "b", "c", "c", "a", "a"), levels=c("b", "c", "a"))
What is the best way to do this?
I guess you want to establish that a two-way tabulation has the same number of populated levels as a one way classification. The default setting in interaction
is to represent all levels even if not populated but setting drop=TRUE changes it to suit your purpose:
> levels (interaction(x1,x2, drop=TRUE) )
[1] "c.a" "b.b" "a.c"
> length(levels(x1) ) == length(levels(interaction(x1,x2,drop=TRUE) ) )
[1] TRUE
The generalization would look at all( <the 3 necessary logical comparisons> )
:
all( length(levels(x1) ) == length(levels(interaction(x1,x2,drop=TRUE) ) ),
length(levels(x1) ) == length(levels(interaction(x1,x3,drop=TRUE) ) ),
length(levels(x1) ) == length(levels(interaction(x1,x4,drop=TRUE) ) ) )
#[1] TRUE