rperformancercppmatrix-inversecross-product# Fastest way in R to compute the inverse for large matrices

I need to compute a hat matrix (as from linear regression). Standard R code would be:

`H <- tcrossprod(tcrossprod(X, solve(crossprod(X))), X)`

with `X`

being a relatively large matrix (i.e 1e5*100), and this line has to run thousands of times. I understand the most limiting part is the inverse computation, but the crossproducts may be time-consuming too. Is there any faster alternative to perform these matrix operations? I tried Rcpp and reviewed several posts but any alternative I tested was slower. Maybe I did not code properly my C++ code, as I am not an advanced C++ programmer.

Thanks!

Solution

Chasing the code for this line by line is a little difficult because the setup of R code is a little on the complicated side. But read on, pointers below.

The important part is that the topic has been discussed many times: what happens is that R dispatches this to the BLAS (Basic Linear Algebra Subprogram) and LAPACK (Linear Algebra PACKage) libraries. Which contain *the most efficient code known to man* for this. In general, you cannot gain on it by rewriting.

One can gain performance differences by switching one BLAS/LAPACK implementation for another---there are many, many posts on this online too. R itself comes with the so-called 'reference BLAS' known to be correct, but slowest. You can switch to Atlas, OpenBLAS, MKL, ... depending on your operating system; instructions on how to do so are in some of the R manuals that come with your installation.

For completeness, per file `src/main/names.c`

the commands `%*%`

, `crossprod`

and `tcrossprod`

all refer to `do_matprod`

. This is in file src/main/array.c and does much argument checking and arranging and branching on types of arguments but *e.g.* one path then calls

```
F77_CALL(dsyrk)(uplo, trans, &nc, &nr, &one, x, &nr, &zero, z, &nc
FCONE FCONE);
```

which is this LAPACK function. It is essentially the same for all others making this an unlikely venue for your optimisation.

- How can I extract a value from a dataframe based on a range of values?
- Confidence Interval for Some (But Not All) Interaction Terms in a Polynomial Interacted with a Binary Treatment
- Avoiding crazy lines while mapping in r
- Overlay points on top of violin plot
- Add percentage labels to geom_col()
- How to place a js inside a swiper with appendTo()?
- How to make a single plot from two dataframes with ggplot2
- Error in installing "TopicModels" package in Google Colab
- Identify connected subnetworks (R-igraph)
- Adding labels to geom_col()
- Legend title in ggplot2
- R list files with multiple conditions
- R - getting count of maximum-sized sub-group when summarising at prior group_by level
- Problems when running GDC_prepare in R
- Filtering files with names starting with a specific string
- Mutate a vector within a pipe chain
- How to sum a variable by group
- Using hex code to change text color in RMarkdown PDF (R)
- How to Remove Degree and Cardinal Direction Symbols from ggplot Coordinate Axes
- rstan and brms cause R and RStudio session abort
- How to change the plot background color generated by plot(effect(...)) in grey with white grid in R？
- SQL query on arrow duckdb workflow in R
- Venn diagram with duplicated elements
- R- Filter by time closest to midnight
- Difference between rlm() and lm_robust
- Is there a way to combine sorting an rhandsontable and removing from an rhandsontable?
- Split violin plot with ggplot2
- ggbarplot top of one bar does not align with its error bar
- read file from google drive
- Placing text into stacked bar charts in ggplot