When running a linear regression using sparklyr, such as:
cached_cars %>%
ml_linear_regression(mpg ~ .) %>%
summary()
The results do not include standard errors
Deviance Residuals:
Min 1Q Median 3Q Max
-3.47339 -1.37936 -0.06554 1.05105 4.39057
Coefficients:
(Intercept) cyl_cyl_8.0 cyl_cyl_4.0 disp hp drat
16.15953652 3.29774653 1.66030673 0.01391241 -0.04612835 0.02635025
wt qsec vs am gear carb
-3.80624757 0.64695710 1.74738689 2.61726546 0.76402917 0.50935118
R-Squared: 0.8816
Root Mean Squared Error: 2.041
Solutions using SparkR are also highly appreciated.
I received a useful answer to my first question at community.rstudio.com.
The answer from yitaoli is the following:
library(sparklyr)
spark_version <- "2.4.4" # This is the version of Spark I ran this example code with,
# but I think everything that follows should work in all versions of Spark anyways
sc <- spark_connect(master = "local", version = spark_version)
cached_cars <- copy_to(sc, mtcars)
model <- cached_cars %>%
ml_linear_regression(mpg ~ .)
coeff_std_errs <- invoke(model$model$.jobj, "summary") %>%
invoke("coefficientStandardErrors")
print(coeff_std_errs)