Search code examples
r

Test whether one factor is nested in another


Is there an easy way to determine if one vector is nested within another? In other words, in the example below, each value of bar is associated with one and only one value of foo, so bar is nested within foo.

data.frame(foo=rep(seq(4), each=4), bar=rep(seq(8), each=2))

To clarify, here is the desired result:

foo <- rep(seq(4), each=4)
bar <- rep(seq(8), each=2)
qux <- rep(seq(8), times=2)
# using a fake operator for illustration:
bar %is_nested_in% foo  # should return TRUE
qux %is_nested_in% foo  # should return FALSE

Solution

  • Suppose you have two factors f and g, and want to know whether g is nested in f.

    Method 1: For people who love linear algebra

    Consider the design matrix for two factors:

    Xf <- model.matrix(~ f + 0)
    Xg <- model.matrix(~ g + 0)
    

    If g is nested in f, then the column space of Xf must be a subspace of the column space of Xg. In other word, for any linear combination of Xf's columns: y = Xf %*% bf, equation Xg %*% bg = y can be solved exactly.

    y <- Xf %*% rnorm(ncol(Xf))  ## some random linear combination on `Xf`'s columns
    c(crossprod(round(.lm.fit(Xg, y)$residuals, 8)))  ## least squares residuals
    ## if this is 0, you have nesting.
    

    Method 2: For people who love statistics

    We check contingency table:

    M <- table(f, g)
    

    If all columns have only one non-zero entry, you have g nested in f. In other words:

    all(colSums(M > 0L) == 1L)
    ## `TRUE` if you have nesting
    

    Comment: For any method, you can squeeze the code into one line easily.