Why Do User Defined Contrasts in R need to be provided as an INVERSE matrix of weights?

I would like to use some user defined contrasts in R in addition to the default contrast codes (contr.treatment / contr.sum / contr.helmert). However, the guidance I have read indicates that these need to be provided in an inverse matrix. Could anybody explain why?

Ie, the guidance here: https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/ states:

The final contrast matrix (or coding scheme) turns out to be the inverse of mat transposed.

While there are a handful of automatic contrast functions in R (what I’ve been using so far), you will sometimes find yourself wanting to run comparisons that are not included there. When that happens, you can specify them yourself. You need to be careful, though, because the contrasts() function is a sneaky little bastard, as noted above. To apply contrast weights, you’ll need to give it the inverse of your matrix of weights.

Neither explains why. In addition, computing an inverse matrix messes up the contrast coefficients so that they become difficult to interpret as they are no longer a standard unit apart, so I'd like to know why it's necessary.

Solution

Let me try and see what happens on a simple example.

We start with the data:

x = sample(letters[1:3], size = 300, replace = T)
y = rnorm(300)
head(x, 10)
 [1] "c" "c" "c" "c" "a" "a" "a" "c" "b" "a"

Let's say your custom contrasts are:

contr_mat = matrix(c(1, -0.5, -0.5,
                     1, 0, -1),
                   nrow = 2,  byrow = TRUE)
contr_mat
     [,1] [,2] [,3]
[1,]    1 -0.5 -0.5
[2,]    1  0.0 -1.0

So you are comparing for example the mean of y in group a to the mean of y in group c (second contrast).

Let's build a contrast coding (weight) matrix using the pseudoinverse of the contrast matrix using MASS::ginv(). We also add an intercept term:

C1 <- cbind(1, ginv(contr_mat))
round(C1, 3)
     [,1]   [,2] [,3]
[1,]    1  0.667    0
[2,]    1 -1.333    1
[3,]    1  0.667   -1

This matrix is square and full rank, and by inverting it we get the contrasts back, with an intercept term:

C <- solve(C1)
round(C, 2)
     [,1]  [,2]  [,3]
[1,] 0.33  0.33  0.33
[2,] 1.00 -0.50 -0.50
[3,] 1.00  0.00 -1.00

Because C and C1 are inverses, you can include them in the model supporting your comparisons:

y = Xu + e = XIu + e= X(C1C)u + e = (XC1)(Cu) + e

where X the design matrix (for this cell means model), u is the vector of means, I the identity matrix, and e the vector of random component with normal distribution. (Cu) represents the contrasts you want.

Now we can use the least square equations to evaluate the contrasts Cu, using the modified design matrix (XC1). So the inverse of the contrast matrix is used to evaluate the actual contrasts Cu:

X <- model.matrix(~x + 0)    # model matrix for the cell means model
X1 <- X %*% C1               # this is the modified model matrix with the inverse

Now we solve by the method of least squares or by lm(). lm() takes the inverse of your contrast matrix as argument and adds the intercept term, then evaluates the contrasts by the least squares method:

# least-squares equation:

solve ( t(X1)  %*% X1 ) %*% t(X1) %*% y
            [,1]
[1,] -0.06524626
[2,]  0.03627632
[3,]  0.04820965

# lm() with modified design matrix
summary(lm(y ~ X1 +0 ) )

Call:
lm(formula = y ~ X1 + 0)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.1953 -0.7073  0.0250  0.7382  2.7251 

Coefficients:
    Estimate Std. Error t value Pr(>|t|)
X11 -0.06525    0.06049  -1.079    0.282
X12  0.03628    0.12596   0.288    0.774
X13  0.04821    0.14336   0.336    0.737


# lm() with your contrasts
summary( lm(y ~ x, contrasts = list(x = ginv(contr_mat))) )

Call:
lm(formula = y ~ x, contrasts = list(x = ginv(contr_mat)))

Residuals:
    Min      1Q  Median      3Q     Max 
-3.1953 -0.7073  0.0250  0.7382  2.7251 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.06525    0.06049  -1.079    0.282
x1           0.03628    0.12596   0.288    0.774
x2           0.04821    0.14336   0.336    0.737

The last two rows contain the comparisons you want:

means <- tapply(X = y, INDEX = factor(x), FUN = mean)
C %*% means
            [,1]
[1,] -0.06524626
[2,]  0.03627632
[3,]  0.04820965

(means[1] + means[2] + means[3])/3
-0.06524626 
means[1] - (means[2] + means[3])/2
0.03627632 
means[1] - means[3]
0.04820965

I made a similar response with better formatting for the matrix terms here:

https://stats.stackexchange.com/questions/535123/custom-contrasts-in-r-why-should-i-take-the-generalized-inverse-of-transpose-of/612043#612043

A very good reference is:

https://cran.r-project.org/web/packages/codingMatrices/vignettes/codingMatrices.pdf