I am trying to upload / import CSV file in MongoDB in my local and this is what I am trying
from pyspark.sql import SparkSession
from pyspark.conf import SparkConf
conf = SparkConf() \
.setAppName("MongoDB") \
.setMaster("local[*]") \
.set("spark.mongodb.input.uri", "mongodb://localhost:27017/Scrub_Data.RPT_AR") \
.set("spark.mongodb.output.uri", "mongodb://localhost:27017/Scrub_Data.RPT_AR")
spark = SparkSession.builder \
.config(conf=conf) \
.getOrCreate()
df = spark.read.csv("mypathtocsvfile", header=True, inferSchema=True)
df.write \
.format("com.mongodb.spark.sql.DefaultSource") \
.mode("append") \
.option("uri", "mongodb://localhost:27017/Scrub_Data.RPT_AR") \
.save()
The above code is throwing Py4JJavaError An error occurred while calling o39.save. : java.lang.ClassNotFoundException: Failed to find data source: com.mongodb.spark.sql.DefaultSource
The error message suggests that the Spark connector for MongoDB is not available on your system. You need to make sure that you have the required packages installed for Spark to communicate with MongoDB.