I have a Dataset/Dataframe with a mllib.linalg.Vector
(of Doubles) as one of the columns. I would like to add another column to this dataset of type ml.linalg.Vector
to this data set (so I will have both types of Vectors
). The reason is I am evaluating few algorithms and some of those expect mllib
vector and some expect ml
vector. Also, I have to feed o/p of one algorithm to another and each use different types.
Can someone please help me convert mllib.linalg.Vector
to ml.linalg.Vector
and append a new column to the data set in hand. I tried using MLUtils.convertVectorColumnsToML()
inside an UDF
and regular functions but not able to get it to working. I am trying to avoid creating a new dataset and then doing inner join and dropping the columns as the data set will be huge eventually and joins are expensive.
You can use the method toML
to convert from mllib
to ml
vector. An UDF
and usage example can look like this:
val convertToML = udf((mllibVec: org.apache.spark.mllib.linalg.Vector) = > {
mllibVec.asML
})
val df2 = df.withColumn("mlVector", convertToML($"mllibVector"))
Assuming df
to be the original dataframe and the column with the mllib
vector to be named mllibVector
.