Search code examples
scalaapache-sparkapache-spark-ml

Convert Dataframe with Vector column to Dataset - which type to be used in the case class


I have a dataframe with a column of vector type as a result from onehot encoder. Let's name the column Vector.

With a case class Example(vector: WhichType), I want to map the dataframe to a Dataset:

val ds = dataframe.as[Example]

Question is: Which type should the property 'vector' in the case class have.

I get an error message:

need an array field but got structtype:tinyint,size:int,indices:array<int,values:array>;


Solution

  • If you're using Spark ML, then you can use the Vector type imported below:

    import org.apache.spark.ml.linalg.Vector
    
    case class Example(vector: Vector)