r regression linear-regression floating-accuracy numerical-stability

Increasing the number of observations have R throw random coefficients - Numerical stability problem?

I have this code

rm(list=ls())
N = 20000
xvar <- runif(N, -10, 10) 
e <- rnorm(N, mean=0, sd=1)
yvar <- 1 + 2*xvar + e
plot(xvar,yvar)
lmMod <- lm(yvar~xvar)
print(summary(lmMod))

I expected the coefficients to be something similar to [1,2].

Instead, with N =20000, R keeps throwing at me random numbers that are not statistically significant and don't fit the model, the $R^2$ is really low..I just don't see what I am doing wrong. Here in an example output:

Call:
lm(formula = yvar ~ xvar)

Residuals:
   Min     1Q Median     3Q    Max 
-47.23  -9.10   1.24  11.23  23.74 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.03163    0.08291   0.381  0.70286   
xvar         0.04290    0.01427   3.006  0.00265 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.73 on 19998 degrees of freedom
Multiple R-squared:  0.0009635, Adjusted R-squared:  0.0009135 
F-statistic: 19.29 on 1 and 19998 DF,  p-value: 1.131e-05

However, if I put N=200 or N=2000, it works. The coefficients resemble the real ones, and are inside two standard deviations of the real ones, and I get $R^2$ values as high as 99%, and the coefficients are all statistically significant with $p<<0.01$.

What is happening here? why increasing the number of observations worsen the regression? Is R covertly experiencing problems of numerical stability?

I am running R 3.6.0 on Kubuntu 19.04. The same problem happens also by running R on the command line using the --vanilla option.

EDIT: here is the output of sessioninfo()

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 19.04

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.0 tools_3.6.0

Solution

As noted in my previous answer, this was due to using Intel's MKL as the BLAS backend.

However, it seems fixed with the current Intel MKL, as this code shows:

library(flexiblas)

mkl <- "$HOME/intel/oneapi/mkl/latest/lib/intel64/libmkl_rt.so"
idx <- flexiblas_load_backend(mkl)
flexiblas_switch(idx)
sessionInfo()
print(flexiblas_current_backend())

rm(list=ls())
N = 100000
xvar <- runif(N, -10, 10) 
e <- rnorm(N, mean=0, sd=1)
yvar <- 1 + 2*xvar + e
plot(xvar,yvar)
lmMod <- lm(yvar~xvar)
print(summary(lmMod))

The correct coefficients are returned.