Following the instructions found in https://learn.microsoft.com/en-us/azure/databricks/dev-tools/vscode-ext/dev-tasks/databricks-connect, when I try to run the example codes provided (https://learn.microsoft.com/en-us/azure/databricks/dev-tools/vscode-ext/tutorial) in particuarly the 'show' method , I get the following error code in my VS Code terminal. Same error happens when I run it using a jupyter notebook.
Just wondering if anyone has come across such issue and has resolved it?
Following are some of the key points worth mentioning:
pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.UNIMPLEMENTED details = "Method not found: spark.connect.SparkConnectService/ReattachExecute" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Method not found: spark.connect.SparkConnectService/ReattachExecute", grpc_status:12, created_time:"2023-10-02T22:47:34.7298799+00:00"}"
from pyspark.sql import SparkSession
from pyspark.sql.types import *
spark = SparkSession.builder.getOrCreate()
schema = StructType([
StructField('CustomerID', IntegerType(), False),
StructField('FirstName', StringType(), False),
StructField('LastName', StringType(), False)
])
data = [
[ 1000, 'Mathijs', 'Oosterhout-Rijntjes' ],
[ 1001, 'Joost', 'van Brunswijk' ],
[ 1002, 'Stan', 'Bokenkamp' ]
]
customers = spark.createDataFrame(data, schema)
customers.show()
The version of DB Connect should match the cluster version. It's actually mentioned in the documentation:
The Databricks Connect major and minor package version should match your Databricks Runtime version. Databricks recommends that you always use the most recent package of Databricks Connect that matches your Databricks Runtime version. For example, when you use a Databricks Runtime 14.0 cluster, you should also use the databricks-connect==14.0.* package.