I have a dataset with over 10 categorical variables and about 20 numerical ones. I'm trying to edit Stef van Buuren's mice.impute.logreg
function which is available on github, to call glm.fit()
, but with a higher maxit
value to try to reach convergence. However, on running the code as is, I get the following error:
Error: Only strings can be converted to symbols
and it comes from this line in the code:
rv <- t(chol(sym(fit.sum$cov.unscaled)))
I went ahead to print out the content of fit.sum$cov.unscaled
, and got a huge covariance matrix(?) with all variables (categorical ones kinda one-hot-encoded(?)), something like this, but way larger:
Proteinuria22 Proteinuria23 Proteinuria24 Proteinuria25 Aetiol22 Aetiol23 Aetiol24
-0.0775687218 6.603074e-02 6.995692e-01 -1.0462947407 -1.990400e-01 -3.756997e+01 -6.198267e-01
Weight2 -0.0003022753 6.802872e-04 -1.138967e-03 -0.0043737786 2.550278e-04 3.380858e-02 6.343819e-04
Height2 0.0174235854 -8.945169e-02 -2.588742e-01 0.2947104430 -1.763788e-01 2.027542e+00 -3.676413e-02
BMI22 0.0038176385 -2.246294e-02 3.529623e-02 0.0507158023 -1.959203e-03 1.515110e+00 3.618223e-02
BMI23 0.0463573025 4.600740e-02 1.210799e-01 0.1009359117 6.368376e-03 7.268413e-01 -4.677462e-03
BMI24 0.0230542190 4.822956e-02 1.424563e-01 0.2136974371 -7.688207e-02 -4.099045e+00 -4.920604e-02
Proteinuria21 0.2564365948 2.399999e-01 2.869407e-01 0.2866854741 -3.345524e-02 7.021764e+00 -1.380307e-02
Proteinuria22 0.5114421153 2.658057e-01 2.444392e-01 0.2575295706 -5.555202e-02 2.132465e+00 -2.367527e-02
Proteinuria23 0.2658056994 8.278569e-01 2.805812e-01 0.1743841777 -5.433797e-02 -5.289189e+00 -1.905688e-02
Proteinuria24 0.2444391680 2.805812e-01 5.436426e-01 0.2272864202 -4.551615e-02 2.533664e+00 -1.962130e-02
Proteinuria25 0.2575295706 1.743842e-01 2.272864e-01 1.1656567100 -7.355628e-02 9.412580e+00 -1.330318e-01
Aetiol22 -0.0555520221 -5.433797e-02 -4.551615e-02 -0.0735562813 4.327236e-01 4.698377e+00 1.196196e-01
Aetiol23 2.1324651321 -5.289189e+00 2.533664e+00 9.4125804535 4.698377e+00 1.175992e+04 2.984111e+00
Since I'm still not very conversant with r
, I really have no idea what this means... I understand that sym()
is used to convert a string to a symbol, but I don't understand how (or why) such a huge matrix would be converted into a symbol. Any ideas, please?
Thanks to pointers from @arun's comment, I discovered that I only needed to remove the sym()
function, given the use of the surrounding chol
function:
Compute the Choleski factorization of a real symmetric positive-definite square matrix.
I'm yet to figure out why the code author put the sym()
function there in the first place, though, since the code apparently breaks with it, but works fine without it.