Search code examples
scalaapache-sparkspark-graphx

VertexRDD giving me type mismatch error


I am running the following code attempting to create a Graph in GraphX in Apache Spark.

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

import org.apache.spark.graphx.GraphLoader 

import org.apache.spark.graphx.Graph

import org.apache.spark.rdd.RDD
import org.apache.spark.graphx.VertexId

//loads file from the array

val lines = sc.textFile("hdfs://moonshot-ha-nameservice/data/google-plus/2309.graph");

//maps lines and takes the first 21 characters of each line which is the node.

val result = lines.map( line => line.substring(0,20))

//creates a new variable with each node followed by a long .

val result2 = result.map(word => (word,1L).toLong)

//where i am getting an error

val vertexRDD: RDD[(Long,Long)] = sc.parallelize(result2)

i am getting the following error:

 error: type mismatch;

 found   : org.apache.spark.rdd.RDD[(Long, Long)]

 required: Seq[?]

Error occurred in an application involving default arguments.
         val vertexRDD: RDD[(Long, Long)] = sc.parallelize(result2)

Solution

  • First, your maps can be simplified to the following code:

    val vertexRDD: RDD[(Long, Long)] = 
      lines.map(line => (line.substring(0, 17).toLong, 1L))
    

    Now, to your error: you cannot call sc.parallelize with an RDD. Your vertexRDD is already defined by result2. You can then create your graph with result2 and your EdgesRDD:

    val g = Graph(result2, edgesRDD)
    

    or, if using my suggestion:

    val g = Graph(vertexRDD, edgesRDD)