Search code examples
scalaapache-sparkrdfturtle-rdfsansa

Why does Spark fail with "value rdf is not a member of org.apache.spark.sql.SparkSession"?


I am trying to use SANSA-RDF for reading turtle RDF files into Spark and create a graph. I am getting an error when I execute the following code. What am I missing?

    import org.apache.jena.query.QueryFactory
    import org.apache.jena.riot.Lang
    import org.apache.spark.sql.SparkSession
    import net.sansa_stack.rdf.spark.io.rdf._
    import net.sansa_stack.rdf.spark.io._
    import scala.io.Source

    object SparkExecutor {
      private var ss:SparkSession = null

      def ConfigureSpark(): Unit ={

        ss = SparkSession.builder
          .master("local[*]")
          .config("spark.driver.cores", 1)
          .appName("LAM")
          .getOrCreate()

      }

      def createGraph(): Unit ={
        val filename = "xyz.ttl"
        print("Loading graph from file"+ filename)
        val lang = Lang.TTL
        val triples = ss.rdf(lang)(filename)
        val graph = LoadGraph(triples)    
      }
    }

I am calling the SparkExecutor from main function using

    object main {
      def main(args: Array[String]): Unit = {
        SparkExecutor.ConfigureSpark()
        val RDFGraph = SparkExecutor.createGraph()
      }
    }

This results in the following error

    Error: value rdf is not a member of org.apache.spark.sql.SparkSession
val triples = ss.rdf(lang)

Solution

  • Well there is an implicit conversion, if you see the SANSA-RDF source code in

    sansa-rdf-spark/src/main/scala/net/sansa_stack/rdf/spark/io/package.scala:159
    

    rdf(lang) is not a method of spark session, but of implicit class RDFReader, so you need to import the package where the implicit definition is available. Please try adding

    import net.sansa_stack.rdf.spark.io._

    and let us know the result.