Search code examples
mongodbpysparkconnectordatabricks

PySpark Mongodb / java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame


I'm trying to connect pyspark to MongoDB with this (running on Databricks) :

from pyspark import SparkConf, SparkContext
from pyspark.mllib.recommendation import ALS
from pyspark.sql import SQLContext
df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()

but I get this error

java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame

I am using Spark 2.0 and Mongo-spark-connector 2.11 and defined spark.mongodb.input.uri and spark.mongodb.output.uri


Solution

  • I managed to make it work because I was using mongo-spark-connector_2.10-1.0.0 instead of mongo-spark-connector_2.10-2.0.0