From R, we have the function factors()
. I would like to use this function in a parallelize way, with Spark R.
My version of Spark is 1.6.2, and I cannot find an equivalent in the documentation. I thought I could do it with a map, but I am not certain I understand this answer, and there should be an easier way.
So to put it simply: What is the equivalent of factors()
in Spark R ?
There is no direct equivalent. Spark encodes every type of variable using double precision numbers and uses metadata to distinguish between different types. For ML algorithms you can use formulas which automatically encode columns.