I am trying to connect a Python notebook in an Azure Databricks cluster on a CosmosDB MongoDB API database.
I'm using the mongo connector 2.11.2.4.2 Python 3
My code is as follows:
ReadConfig = {
"Endpoint" : "https://<my_name>.mongo.cosmos.azure.com:443/",
"Masterkey" : "<my_key>",
"Database" : "database",
"preferredRegions" : "West US 2",
"Collection": "collection1",
"schema_samplesize" : "1000",
"query_pagesize" : "200000",
"query_custom" : "SELECT * FROM c"
}
df = spark.read.format("mongo").options(**ReadConfig).load()
df.createOrReplaceTempView("dfSQL")
The error I get is that Could not initialize class com.mongodb.spark.config.ReadConfig$.
How can I work this out?
Answer to my own question.
Using MAVEN as the source, I installed the right library to my cluster using the path
org.mongodb.spark:mongo-spark-connector_2.11:2.4.0
Spark 2.4
An example of code I used is as follows (for those who wanna try):
# Read Configuration
readConfig = {
"URI": "<URI>",
"Database": "<database>",
"Collection": "<collection>",
"ReadingBatchSize" : "<batchSize>"
}
pipelineAccounts = "{'$sort' : {'account_contact': 1}}"
# Connect via azure-cosmosdb-spark to create Spark DataFrame
accountsTest = (spark.read.
format("com.mongodb.spark.sql").
options(**readConfig).
option("pipeline", pipelineAccounts).
load())
accountsTest.select("account_id").show()