Is it possible to switch workspace with the use of databricks-connect?
I'm currently trying to switch with: spark.conf.set('spark.driver.host', cluster_config['host'])
But this gives back the following error:
AnalysisException: Cannot modify the value of a Spark config: spark.driver.host
If you look into documentation on setting the client, then you will see that there are three methods to configure Databricks Connect:
databricks-connect configure
- the file name is always ~/.databricks-connect
,DATABRICKS_ADDRESS
, DATABRICKS_API_TOKEN
, ...spark.databricks.service.address
, spark.databricks.service.token
, ... But when using this method, Spark Session could be already initialized, so you may not able switch on the fly, without restarting Spark.But if you use different DBR versions, then it's not enough to change configuration properties, you also need to switch Python environments that contains corresponding version of Databricks Connect distribution.
For my own work I wrote following Zsh script that allows easy switch between different setups (shards) - it allows to use only one shard at time although. Prerequisites are:
<name>-shard
databricks-connect
is installed into activated conda environment with:pyenv activate field-eng-shard
pip install -U databricks-connect==<DBR-version>
~/.databricks-connect-<name>
file that will be symlinked to ~/.databricks-connect
function use-shard() {
SHARD_NAME="$1"
if [ -z "$SHARD_NAME" ]; then
echo "Usage: use-shard shard-name"
return 1
fi
if [ ! -L ~/.databricks-connect ] && [ -f ~/.databricks-connect ]; then
echo "There is ~/.databricks-connect file - possibly you configured another shard"
elif [ -f ~/.databricks-connect-${SHARD_NAME} ]; then
rm -f ~/.databricks-connect
ln -s ~/.databricks-connect-${SHARD_NAME} ~/.databricks-connect
pyenv deactivate
pyenv activate ${SHARD_NAME}-shard
else
echo "There is no configuration file for shard: ~/.databricks-connect-${SHARD_NAME}"
fi
}