For robust fitting problem, I want to find outliers by leverage value, which is the diagonal elements of the 'Hat' matrix. Let the data matrix be X
(n * p), Hat matrix is:
Hat = X(X'X)^{-1}X'
where X'
is the transpose of X
.
When n
is large, Hat matrix is a huge (n * n
). So computing it is time consuming. I'm wondering is there a faster way to just compute the leverage values?
You did not specify a programming language, so I will only focus on the algorithm part.
If you have fitted your least square problem orthogonal methods like QR factorization and SVD, then hat matrix is in simple form. You may check out my answer Compute projection / hat matrix via QR factorization, SVD (and Cholesky factorization?) for explicit form of the hat matrix (written in LaTeX). Note, OP there wants complete hat matrix, so I did not demonstrate how to efficiently compute only the diagonal elements. But it is really straightforward. Note that for orthogonal methods the hat matrix ends up with a form QQ'
. The diagonals are row-wise inner product. Cross product between different rows give off-diagonals. In R, such row-wise inner product can be computed as rowSums(Q ^ 2)
.
My answer How to compute diag(X %% solve(A) %% t(X)) efficiently without taking matrix inverse? is in a more general setting. Hat matrix is a special case with A = X'X
. This answer focus on the use of triangular factorization like Cholesky factorization and LU factorization, and shows how to compute only diagonal elements. You will see colSums
rather than rowSums
here, because the hat matrix ends up with a form Q'Q
.
Finally I would like to point out something statistical. High-leverage alone does not signal outliers. The combination of high leverage and high residual (i.e., high Cook's distance) signals outliers.