Search code examples
scalaapache-sparkspark-graphx

Spark error: missing parameter type in map()


I am trying to learn Spark GraphX on Windows 10 by replicating the code here. The code is developed using an older version of Spark and I'm not able to find a solution to create a vertex. The following is the code

import scala.util.MurmurHash
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

val path = "F:/Soft/spark/2008.csv"
val df_1 = spark.read.option("header", true).csv(path)

val flightsFromTo = df_1.select($"Origin",$"Dest")
val airportCodes = df_1.select($"Origin", $"Dest").flatMap(x => Iterable(x(0).toString, x(1).toString))

// error caused by the following line
val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x), x))

The following is the error message:

<console>:57: error: missing parameter type
       val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x), x))
                                                                                  ^

I think the syntax is obsolete and I tried to find the latest syntax on official documents but it was of no help. The data set can be downloaded from here.

UPDATE:

Basically, I'm trying to create a Vertex and Edge, to finally create a graph as shown in the tutorial. I'm also new to the Map-Reduce paradigm.


Solution

  • The following lines of code worked for me.

    // imported latest library - works without this too, just gives a warning
    import scala.util.hashing.MurmurHash3
    
    // datasets are set to rdd - this is the cause of the error
    val flightsFromTo = df_1.select($"Origin",$"Dest").rdd
    val airportCodes = df_1.select($"Origin", $"Dest").flatMap(x => Iterable(x(0).toString, x(1).toString)).rdd
    
    val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash3.stringHash(x), x))