Search code examples
mongodbcsvpyspark

Error while trying to Import CSV to MongoDB using PySpark


I am trying to upload / import CSV file in MongoDB in my local and this is what I am trying

from pyspark.sql import SparkSession
from pyspark.conf import SparkConf

conf = SparkConf() \
    .setAppName("MongoDB") \
    .setMaster("local[*]") \
    .set("spark.mongodb.input.uri", "mongodb://localhost:27017/Scrub_Data.RPT_AR") \
    .set("spark.mongodb.output.uri", "mongodb://localhost:27017/Scrub_Data.RPT_AR")

spark = SparkSession.builder \
    .config(conf=conf) \
    .getOrCreate()


df = spark.read.csv("mypathtocsvfile", header=True, inferSchema=True)

df.write \
    .format("com.mongodb.spark.sql.DefaultSource") \
    .mode("append") \
    .option("uri", "mongodb://localhost:27017/Scrub_Data.RPT_AR") \
    .save()

The above code is throwing Py4JJavaError An error occurred while calling o39.save. : java.lang.ClassNotFoundException: Failed to find data source: com.mongodb.spark.sql.DefaultSource


Solution

  • The error message suggests that the Spark connector for MongoDB is not available on your system. You need to make sure that you have the required packages installed for Spark to communicate with MongoDB.