Search code examples
scalaapache-sparkspark-graphx

Scala: GraphX: error: class Array takes type parameters


I am trying to build an Edge RDD for GraphX. I am reading a csv file and converting to DataFrame Then trying to convert to an Edge RDD:

val staticDataFrame = spark.
  read.
  option("header", true).
  option("inferSchema", true).
  csv("/projects/pdw/aiw_test/aiw/haris/Customers_DDSW-withDN$.csv")

val edgeRDD: RDD[Edge[(VertexId, VertexId, String)]]  = 
  staticDataFrame.select(
    "dealer_customer_number",
    "parent_dealer_cust_number",
    "dealer_code"
  ).map{ (row: Array) => 
    Edge((
      row.getAs[Long]("dealer_customer_number"), 
      row.getAs[Long]("parent_dealer_cust_number"),
      row("dealer_code")
    ))
  }

But I am getting this error:

<console>:81: error: class Array takes type parameters
       val edgeRDD: RDD[Edge[(VertexId, VertexId, String)]]  = staticDataFrame.select("dealer_customer_number", "parent_dealer_cust_number", "dealer_code").map((row: Array) => Edge((row.getAs[Long]("dealer_customer_number"), row.getAs[Long]("parent_dealer_cust_number"), row("dealer_code"))))
                                                                                                                                                                      ^

The result for

staticDataFrame.select("dealer_customer_number", "parent_dealer_cust_number", "dealer_code").take(1)

is

res3: Array[org.apache.spark.sql.Row] = Array([0000101,null,B110])

Solution

  • First, Array takes type parameters, so you would have to write Array[Something]. But this is probably not what you want anyway.

    The dataframe is a Dataset[Row], not a Dataset[Array[_]], therefore you have to change

    .map{ (row: Array) => 
    

    to

    .map{ (row: Row) =>
    

    Or just omit the typing completely (it should be inferred):

    .map{ row =>